Fastest way to a GenServer communicate with all childrens of some supervisor?

Hello,

A part of my application currently have a supervisor hierarchy that looks something like this:

supex

As you can see, I have a Manager Genserver and a Childs Supervisor that will contain a lot of children’s (this is a normal supervisor for now, but maybe in the future I would want to start the children on demand, so that can be changed to a Dynamic Supervisor in the future I guess).

My question is regarding the Manager Genserver communication with the children’s. Basically Manager Genserver needs to send broadcast messages to all the children’s, this can happen very frequently (like multiple times a second), so I have a performance concern.

So, what kind of communication strategy should I use in this case? I’m kinda new with OTP, so I’m not 100% sure and are contemplating the following solutions:

  1. Create a function inside Childs Supervisor that will iterate and do a GenServer.call to all it’s children. Not sure if this is efficient or will bottleneck all the communication if one children blocks the call for some reason;

  2. Send a message from each Child to the Manager GenServer with it’s PID, store it in a list and send GenServer.call iterating through it;

  3. Use a PubSub system like Phoenix.PubSub. This seems like a good solution, but as far as I know the Phoenix.PubSub is global and used more for a distributed case. I’m not sure if using it would be over-engineering and maybe be a bottleneck;

  4. Use the Registry. this seems like another good solution too but I only saw examples of global (but local to the beam node) examples of Registry, in my case I only want to handle communication from Manager Genserver to all the Childs, is it possible to create maybe a local Registry for Parent Supervisor?

Any help would be much appreciated.

Thanks a lot!

1 Like

Supervisor has the which children function (and more…)

https://hexdocs.pm/elixir/Supervisor.html#which_children/1

You should try to benchmark if sending message to that list of children is within your refresh frequency.

3 Likes

I’d go the same way as @kokolegorille but if for some reason it could not satisfy your requirements I would implement a pub-sub pattern with a registry.

Is every child process using every message broadcast by the Manager GenServer ?

(2), (3) and (4) are common techniques with each having their own pro/cons.

Yes, manager will send a message that all child process need to receive and process it.

So, I created a simple project to test the solutions, https://github.com/sezaru/msg_bench_test.

I implemented 3 solutions for now:

  1. Childs send PID to Manager and Manager loops through it and send a GenServer.call message;

  2. Childs send PID to Manager and Manager loops through it and send a GenServer.cast message;

  3. Create a local Registry, child’s subscribe to receive a type of message and Manager send that message through Registry.

I didn’t implement the Supervisor which children one since it should be very similar in performance with 1) and 2).

If you are interested in test the code, you can run it inside Iex and run the following code:

MsgBench.run_msg_manager(:msg_call)
MsgBench.send_msg(:msg_call)

For 1).

MsgBench.run_msg_manager(:msg_cast)
MsgBench.send_msg(:msg_cast)

For 2).

MsgBench.run_msg_manager(:registry)
MsgBench.send_msg(:registry)

For 3).

1). Simply timeouts since it will do the call sequentially and will take a long time to finish (I added a :timer.sleep(1_000) to each child when processing the message).

2). seems to work great, but I feel that I shouldn’t abuse GenServer.cast too much and I guess I wouldn’t have any guarantee that the message was delivered to all childs.

3). Looks like it works as good as 2), but I’m not sure (and couldn’t find it in the documentation) if I have any kind of guarantee that my childs will receive the message when I call the Registry.dispatch.

Also, I’m not so sure how to modify my code so I can really see the overhead difference between each solution.

Your 1) timeouts because you implemented it inside a handle_call and call it from another process. But this problem is out of scope. You could add a :infinity timeout.

You could call all children concurrently:

    reply = state.childs
    |> Map.values()
    |> Enum.map(fn pid -> Task.async(fn -> GenServer.call(pid, {:received_msg, "hello"}) end) end)
    |> Enum.map(&Task.await/1)

Edit: but this is a poor man’s implementation of GenServer.multi_call.

Or use Task.async_stream to limit the number of concurrent calls.

Calling your children with GenServer.call provides another guarantee : all calls will be made before you start to call your children again. The other solutions do not give this guarantee. But you may not care about it.

2 Likes