Process communication in a dynamic supervision setup

Phillipp · July 10, 2018, 10:42pm

Hey,

let’s say I have a DynamicSupervisor and I want to start children on it. The children itself consist of a Supervisor with a few children on it, which act like one unit under the DynamicSupervisor.

Something like this:

DynamicSupervisor
-- Supervisor
------ Main
------ Worker_1
------ Worker_2
------ Something_else
-- Supervisor
------ Main
------ Worker_1
------ Worker_2
------ Something_else
...

So, our “unit” has 4 GenServers in the example above. What would be the best way to let them easily communicate with each other? I need to find out the pids of the related processes. Has someone done something like this before and can recommend a good way?

A simple usecase here is that Main tells the 3 other servers what to do and they report back to Main.

dom · July 10, 2018, 11:51pm

https://hexdocs.pm/elixir/1.6/Registry.html

david_ex · July 11, 2018, 7:42am

It sounds like you’re looking for the service/worker pattern: https://zxq9.com/archives/1311

Coincidentally, I’ve written a series of blog posts that use it in practice: http://davidsulc.com/blog/2018/07/09/pooltoy-a-toy-process-pool-manager-in-elixir-1-6/

Another possibility is to use a hybrid approach where Supervisor starts Main. Then Main will create a registry instance, and start its siblings with https://hexdocs.pm/elixir/1.6/Supervisor.html#start_child/2 where the provided child spec contains the registry name within the :start value.

Phillipp · July 11, 2018, 7:58am

I already had the idea of one Registry per “unit” in mind but were stuck at the point of discovering that registry process. Your idea sounds quite doable. I need to tinker around a bit and see how it turns out with actual code.

dom · July 11, 2018, 9:08am

You don’t need a registry per unit if you register the processes with tuples, like {:unit_supervisor, unit_id}, {:unit_worker, 1, unit_id}, etc.

If your units don’t have a unique id already you can generate one (make_ref) in the supervisor’s init callback and pass it down as an argument to the children, via their child spec.

Phillipp · July 11, 2018, 9:25am

I actually would like to have one Registry per unit. If something bad happens with the Registry, it won’t affect the other units.

kokolegorille · July 11, 2018, 10:19am

That would kind of defeat the role of Registry, to have one place where to lookup processes by key…

But nothing forbids You to monitor others processes from Main, with a private ETS table, for example. This way Main would be used as a small registry too. You would catch EXIT, and adjust your ETS entries.

Phillipp · July 11, 2018, 10:48am

I already had cases where the GenServers that were monitored by my single Registry blew up the Registry due to fast restarting or whatever. Then everything went down. That’s why I like to have many Registries.

kokolegorille · July 11, 2018, 11:02am

You still can, but what You need is a Registry of Registries

michalmuskala · July 11, 2018, 11:16am

It should be perfectly enough to have one registry for the entire application. It’s performant enough.

Phillipp · July 11, 2018, 11:18am

@michalmuskala I am not saying it’s not performant enough. I just had problems with it in the past that the Registry crashed for some reasons.

dom · July 11, 2018, 11:37am

Are you sure the supervisor of the registry didn’t crash, bringing down the registry together with it?

If you don’t trust Registry, you probably shouldn’t trust DynamicSupervisor either, they’re written by the same folks and do similar work monitoring and tracking processes.

Phillipp · July 11, 2018, 12:16pm

I can’t tell what happened exactly.

The setup in that project is like this:

Supervisor:
-- Registry
-- DynamicSupervisor
------ DynamicSupervisor
---------- GenServer
---------- GenServer
---------- GenServer
------ DynamicSupervisor
---------- GenServer
---------- GenServer
---------- GenServer
...

I had cases where the Supervisor supervising the GenServers died and restarted but I cannot remember that the top level supervisor ever died.

tcoopman · July 11, 2018, 12:34pm

What was your restart policy for the Supervisor? If it was one_for_all than that’s exactly the behaviour you would be seeing. Everything your DynamicSupervisor would crash, everything under the Supervisor would be restarted.

Phillipp · July 11, 2018, 12:36pm

@tcoopman A normal one_for_one. It didn’t happen recently tho.

Phillipp · July 11, 2018, 4:42pm

So, I played around a bit and created an example add.

I used a single Registry which seems to work fine. Code can be found here:

For now, I just added the mechanism to start a new unit which then starts a service and two worker processes. Gonna add unit listing/deletion/state retrieval too. In the mean time, feel free to look over the code and suggest things that could be improved (concept wise).

Here is an example console output:

iex(82)> MyApp.add_unit "Unit 1"
{:ok, #PID<0.1344.0>}
iex(83)> Unit 1/:worker_two got work to do
Unit 1: Message from :worker_two: I don't really wanna do the work today!
Unit 1/:worker_two got work to do
Unit 1: Message from :worker_two: I don't really wanna do the work today!
Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!
Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!

nil
iex(84)> MyApp.add_unit "Unit 2"
{:ok, #PID<0.1351.0>}
iex(85)> Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!
Unit 2/:worker_one got work to do
Unit 2: Message from :worker_one: I don't really wanna do the work today!
Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!
Unit 2/:worker_two got work to do
Unit 2: Message from :worker_two: I don't really wanna do the work today!

xlphs · July 12, 2018, 3:04am

I usually name my GenServer like {:global, {:whatever_name, unique_id}}, then at runtime, look up pid with GenServer.whereis/1, do that every time you need to send a message, just in case the process was restarted in between. If only one instance will be running, use __MODULE__ works too and saves the lookup.

I doubt you need Registry for this.

Phillipp · July 24, 2018, 11:17am

Got another feature I would like to have.

In my setup, given I have started some units. Now I want to “label” them.
Lets say I have Unit 1, 2 and 3. Unit 1 and 2 get the label/value “foo/bar” and “foo/baz” and Unit 3 gets only “bla/blub”.

Now I want to reach out to all Units with the label/value “foo/bar” which in this case are Unit 1 and 2.

There are two ways of doing this which have their own tradeoffs:

Using another Registry with duplicated keys. When I send a unit to register under a certain label/value, it registers itself in that new Registry. The good thing is, if that unit dies, the Registry will remove the entries. The bad thing is, if the unit dies and restarts, the label information are lost. I would need to store them somewhere else to recover them.
Using a simple GenServer to store a map of label/value entries and their connected “unit names” (not pids). The good thing is, if a unit gets restarted, I don’t lose the label information. The bad thing is, the reference map could get outdated and inconsistent.

Which way would you prefer?

xlphs · July 27, 2018, 2:20pm

I think you are looking for https://github.com/uwiger/gproc , one of its feature is “Register a process under several aliases”

Phillipp · July 27, 2018, 2:32pm

I am sure I will run into problem 1 from above too.