Optimal process discovery: Registry lookup vs `Supervisor.which_children`

I have DynamicSupervisor for creating room-related process sub-trees per each room, where only 1 process in a sub-tree is actually needed from the outside:

                    RoomsDynamicSupervisor
					|					|
			RoomSupervisor		     RoomSupervisor
		    |		  |				  |	  |  |	
		ProcessB    ProcessA 		 ...
		|	|	|
	Task   Task  Task

ProcessA will receive domain messages so it needs to be discoverable by clients (e.g. web controller) . ProcessB manages auxiliary functionality which ProcessA will be using internally. RoomSupervisor, ProcessA and ProcessB are registered via Registry, but when I create a room I really only need ProcessA reference to use externally

RoomSupervisor is created on demand, and I plan to hibernate them, and after some substantial inactivity period — to completely shut them down. So basically for each domain operation on ProcessA I need to ensure it is actually started (along with RoomSupervisor). For this I use code like this:

defmodule RoomsDynamicSupervisor do
# ...

@spec server_process(Room.id()) :: pid() | []
  def server_process(room_id) do
    existing_room_entry = ProcessRegistry.lookup(RoomSupervisor.registry_key(room_id))

    if Enum.empty?(existing_room_entry) do
      with {:ok, _} <- DynamicSupervisor.start_child(__MODULE__, {RoomSupervisor, room_id}),
           [{process_a, _}] <- ProcessRegistry.lookup(ProcessA.registry_key(room_id)) do
        process_a
      end
    else
      with [{process_a, _}] <- ProcessRegistry.lookup(ProcessA.registry_key(room_id)) do
        process_a
      end
    end
  end
end

So when client will need to perform operation on ProcessA, it will first need to get its pid using server_process(room_id), which is basically only needed to start the process if it is not started (otherwise I could just use via_tuple(room_id) as the process is registered)

The question is, what is the optimal way to get ProcessA pid? Options are:

  • use Registry. If registry contains RoomSupervisor corresponding for room_id, then we can assume its ProcessA child is also started and we just look it up in the registry as well. Is this a viable strategy? My worries are that ProcessA can be down or restarting meaning we can’t be precisely sure we get the pid, and if the registry lookup returns empty list it won’t be obvious what is the problem
  • use RoomSupervisor.which_children, then iterate over its 2 children to get ProcessA. It additionally will specifically state if the process is restarting. But here my worries are performance: will it be as fast (or similar) as registry lookup? The memory penalty noted in the docs is only relevant for large amount of children, but I only have 2 and I don’t see it will be much higher than that. But, I still need to iterate children

Edit: I like which_children more due increased explicitness:

  @spec server_process(Room.id()) :: Supervisor.child() | :restarting
  def server_process(room_id) do
    case ProcessRegistry.lookup(RoomSupervisor.registry_key(room_id)) do
      [] -> DynamicSupervisor.start_child(__MODULE__, {RoomSupervisor, room_id}) |> elem(1)
      [{process_a, _}] -> process_a
    end
    |> Supervisor.which_children()
    |> Enum.find(fn {id, _, _, _} -> id == ProcessA end)
    |> elem(1)
  end

I’d think that iterating the children of the supervisor would be O(n) on the number of children, while the lookup would be constant?

You could have ProcessA put its’ pid into an ETS table and remove it in a terminate callback? I’m pretty sure that’s what the registry does, though.

Supervisor.which_children is doing a call under the hood to the Supervisor process (which is a GenServer) and thus will block in exclusive lock.

If you have a lot of clients trying to call which_children this can be a bottleneck.

Registry.lookup will have the scalability advantage as all the calls will not be blocking each other.

4 Likes

Worrying about race conditions in Registry is unproductive. Assume it works perfectly until it doesn’t, I say. :smiley:

1 Like

It’s O(n), but there will be likely only 2 children, maybe 3 (very unlikely). AFAIK, terminate isn’t fired on brutal kill, and yes, it is similar to registry

There’re potentially tens (or maybe hundreds) of clients which will need to simultaneously discover and operate on ProcessA, but ProcessA also uses call for each domain operation, and it’s relation to supervisor is basicaly 1-to-1 (so first a call to Supervisor, then call to ProcessA ). I wonder if this additional Supervisor call can still introduce additional problems as compared to just calling ProcessA? AFAIU, ProcessA will queue and serialize messages, and call even acts as a backpressure mechanism which is considered beneficial — so in this situation same can be said about Supervisor call?

But additional call (and iteration) is still an overhead, so Registry indeed is more performant. My only concern is that the status of ProcessA is not explicit if it somehow not operating during the lookup — I’ll just get an empty list as opposed to :restarting (or somehow :undefined). But indeed maybe this is not a thing to easily worry about, as noted by @dimitarvp

I wonder if this additional Supervisor call can still introduce additional problems as compared to just calling ProcessA?

yes, this is a bottleneck

My only concern is that the status of ProcessA is not explicit if it somehow not operating during the lookup

it’s either up or it’s not! if it’s not alive, the result to the client is the same if it’s restarting or down or whatever

2 Likes

Yes, and there’s always the possibility that the process will go down in between the call to find its pid and when you actually send it a message. These crashes should be abnormal, however, and rare if the code is written well.

Personally I’d go with Registry + via_tuple approach, these two can be used together.

2 Likes