Process communication in a dynamic supervision setup

supervision-tree
supervision-strategies
dynamicsupervisor

#1

Hey,

let’s say I have a DynamicSupervisor and I want to start children on it. The children itself consist of a Supervisor with a few children on it, which act like one unit under the DynamicSupervisor.

Something like this:

DynamicSupervisor
-- Supervisor
------ Main
------ Worker_1
------ Worker_2
------ Something_else
-- Supervisor
------ Main
------ Worker_1
------ Worker_2
------ Something_else
...

So, our “unit” has 4 GenServers in the example above. What would be the best way to let them easily communicate with each other? I need to find out the pids of the related processes. Has someone done something like this before and can recommend a good way?

A simple usecase here is that Main tells the 3 other servers what to do and they report back to Main.


#2

https://hexdocs.pm/elixir/1.6/Registry.html


#3

It sounds like you’re looking for the service/worker pattern: https://zxq9.com/archives/1311

Coincidentally, I’ve written a series of blog posts that use it in practice: http://davidsulc.com/blog/2018/07/09/pooltoy-a-toy-process-pool-manager-in-elixir-1-6/

Another possibility is to use a hybrid approach where Supervisor starts Main. Then Main will create a registry instance, and start its siblings with https://hexdocs.pm/elixir/1.6/Supervisor.html#start_child/2 where the provided child spec contains the registry name within the :start value.


#4

I already had the idea of one Registry per “unit” in mind but were stuck at the point of discovering that registry process. Your idea sounds quite doable. I need to tinker around a bit and see how it turns out with actual code.


#5

You don’t need a registry per unit if you register the processes with tuples, like {:unit_supervisor, unit_id}, {:unit_worker, 1, unit_id}, etc.

If your units don’t have a unique id already you can generate one (make_ref) in the supervisor’s init callback and pass it down as an argument to the children, via their child spec.


#6

I actually would like to have one Registry per unit. If something bad happens with the Registry, it won’t affect the other units.


#7

That would kind of defeat the role of Registry, to have one place where to lookup processes by key…

But nothing forbids You to monitor others processes from Main, with a private ETS table, for example. This way Main would be used as a small registry too. You would catch EXIT, and adjust your ETS entries.


#8

I already had cases where the GenServers that were monitored by my single Registry blew up the Registry due to fast restarting or whatever. Then everything went down. That’s why I like to have many Registries.


#9

You still can, but what You need is a Registry of Registries :slight_smile:


#10

It should be perfectly enough to have one registry for the entire application. It’s performant enough.


#11

@michalmuskala I am not saying it’s not performant enough. I just had problems with it in the past that the Registry crashed for some reasons.


#12

Are you sure the supervisor of the registry didn’t crash, bringing down the registry together with it?

If you don’t trust Registry, you probably shouldn’t trust DynamicSupervisor either, they’re written by the same folks and do similar work monitoring and tracking processes.


#13

I can’t tell what happened exactly.

The setup in that project is like this:

Supervisor:
-- Registry
-- DynamicSupervisor
------ DynamicSupervisor
---------- GenServer
---------- GenServer
---------- GenServer
------ DynamicSupervisor
---------- GenServer
---------- GenServer
---------- GenServer
...

I had cases where the Supervisor supervising the GenServers died and restarted but I cannot remember that the top level supervisor ever died.


#14

What was your restart policy for the Supervisor? If it was one_for_all than that’s exactly the behaviour you would be seeing. Everything your DynamicSupervisor would crash, everything under the Supervisor would be restarted.


#15

@tcoopman A normal one_for_one. It didn’t happen recently tho.


#16

So, I played around a bit and created an example add.

I used a single Registry which seems to work fine. Code can be found here:

For now, I just added the mechanism to start a new unit which then starts a service and two worker processes. Gonna add unit listing/deletion/state retrieval too. In the mean time, feel free to look over the code and suggest things that could be improved (concept wise).

Here is an example console output:

iex(82)> MyApp.add_unit "Unit 1"
{:ok, #PID<0.1344.0>}
iex(83)> Unit 1/:worker_two got work to do
Unit 1: Message from :worker_two: I don't really wanna do the work today!
Unit 1/:worker_two got work to do
Unit 1: Message from :worker_two: I don't really wanna do the work today!
Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!
Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!

nil
iex(84)> MyApp.add_unit "Unit 2"
{:ok, #PID<0.1351.0>}
iex(85)> Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!
Unit 2/:worker_one got work to do
Unit 2: Message from :worker_one: I don't really wanna do the work today!
Unit 1/:worker_one got work to do
Unit 1: Message from :worker_one: I don't really wanna do the work today!
Unit 2/:worker_two got work to do
Unit 2: Message from :worker_two: I don't really wanna do the work today!

#17

I usually name my GenServer like {:global, {:whatever_name, unique_id}}, then at runtime, look up pid with GenServer.whereis/1, do that every time you need to send a message, just in case the process was restarted in between. If only one instance will be running, use __MODULE__ works too and saves the lookup.

I doubt you need Registry for this.


#18

Got another feature I would like to have.

In my setup, given I have started some units. Now I want to “label” them.
Lets say I have Unit 1, 2 and 3. Unit 1 and 2 get the label/value “foo/bar” and “foo/baz” and Unit 3 gets only “bla/blub”.

Now I want to reach out to all Units with the label/value “foo/bar” which in this case are Unit 1 and 2.

There are two ways of doing this which have their own tradeoffs:

  1. Using another Registry with duplicated keys. When I send a unit to register under a certain label/value, it registers itself in that new Registry. The good thing is, if that unit dies, the Registry will remove the entries. The bad thing is, if the unit dies and restarts, the label information are lost. I would need to store them somewhere else to recover them.

  2. Using a simple GenServer to store a map of label/value entries and their connected “unit names” (not pids). The good thing is, if a unit gets restarted, I don’t lose the label information. The bad thing is, the reference map could get outdated and inconsistent.

Which way would you prefer?


#19

I think you are looking for https://github.com/uwiger/gproc , one of its feature is “Register a process under several aliases”


#20

I am sure I will run into problem 1 from above too.