Processes registry across a cluster: one global versus many local registries

I have a cluster of nodes, each node containing some processes that are used primarily for maintaining and retrieving state. That state information is read frequently and does not change very often. When I do a read, it always has to call local processes AND remote processes. Writes are typically local only.

Currently I am using the awesome syn library, adding each process to global groups, and then using :syn.multi_call/3 to retrieve state from all the processes on the cluster. So far this has worked mostly fine, except for some bugs resulting from syn’s inherent eventual consistency.

Another solution I had considered was to start up a local process registry on each individual node (e.g. Registry), and then have a single “manager” or “coordinator” process that would know how to call all its local processes and then return the results en masse to remote nodes with a single remote call (e.g. GenServer.multi_call/4 to the coordinator only).

Does this latter pattern have a name? What tradeoffs would I have to consider in terms of performance and flexibility versus a global process registry? It seems like the coordinator could easily become a message bottleneck but I’m not sure how to design around that.

In this particular case should I just be create an mnesia table and use that to sync up the state across nodes?

2 Likes

Can we have some more information about your use case? For example, how much data are we talking about? How many processes? How many nodes? How frequent are the frequent reads and are they concurrent?

Currently there are 2 nodes. The reads are concurrent - they’re being called from Phoenix channel processes as messages are sent and received. There are two process types:

UserSessionServer has 50-100 processes, each holding Ecto models with a bunch of nested preloads. (side note: I learned quickly NOT to pipe these giant structs across the network - the GenServers return only what’s necessary) observer_cli reports ~1MB per process. The most common reads are ~2-5 times per second per process (235 kb per server), but those happen to be all local, since the user session process is linked to its Phoenix channel. multi_calls are done a few times a minute, when a user joins/leaves (13 kb per server).

SiteSessionServer has ~30 processes, also holding Ecto schemas with preloads, each reporting ~1.7 MB. The local reads are 1-2 times per second per server (335 kb per server), and remote reads several times per minute (tiny) - again, when a user joins/leaves.

I got the message sizes by a simple message |> :erlang.term_to_binary() |> byte_size().

So I’d say that you are right in your concerns that this may become a bottleneck. In your current case it is the calling process (the reader) who needs to wait for response from each local and remote process.

In this case it is the “manager” who needs to wait. The general rule of course is to do as quick and short work in a gen server as possible.

I’m not quite sure if the goal is to have global state and the global process registry is just a way to implement this, or if the global process registry is something you need on its own.

If the requirement is to have a global state then mnesia would be a great alternative. syn uses mnesia (or at least used to) for its process register. I’d recommend you explore this over the “manager/coordinator” solution.

Instead of using any library, you may consider a distributed mnesia table where the key is the node() (or some combination such that you can have multiple of those per node if needed), and the value the state. If you cannot go with eventual consistency, ensure that you use transactions when you write and you should be good with normal reads. Probably {read_concurrency,true} can help in your case.

Best,
r.

1 Like

Thanks folks. I will be investigating the mnesia approach - it does sound like right tool for the job.

No worries. Thanks for trying and liking syn.