Best way to logically map IDs to nodes & processes in a distributed Erlang cluster, fast?

Hey everyone,

So I’ve been thinking about this problem for a while now, and can’t think of the best way to solve it.

I basically have resource IDs (Snowflakes) which will run on single GenServers. I have multiple types of resources - users and groups, which run on different Elixir codebases but will still be connected within an Erlang cluster.

Each group will run on it’s own GenServer, but it’s dynamic. The group GenServer will only run if users that are in that specific group are online the service. For example, when a user GenServer starts it’ll grab the groups that user is in from a DB and it’ll need to query some registry to get the PID running each group and call each GenServer to connect to it (group GenServer will just store a list of the PIDs connected). If that registry finds that the group isn’t already online, it’ll need to start that GenServer, then send the PID down back as a reply to the service that called the registry. Remember that this needs to all be done in a distributed Erlang cluster.

Then, when an event happens within that group (e.g. message send) it’ll fan out the message to the user GenServers which will send it to the users down a websocket.

Now, here’s where the problem I’m having comes into play. Messages can be sent from external services, written in different languages - for example, the API, which is written in JS on Node will need to tell the Elixir group GenServer corresponding to the ID that the message was sent to a group, probably via RabbitMQ somehow, but the thing is - the API won’t be able to find the PID of that group because it has no logical map to it.

I know there’s hash rings which are stateless, but I’m not sure how to write a hash function correctly, and even still, wouldn’t that mean that some nodes could run a different amount of groups without knowing, since only the ID would be hashed?

At the moment, my only solution to this is to create some sort of dictionary service in Elixir which stores an ETS table of every single ID and corresponding node & PID linked to that ID. Then, it can listen to RabbitMQ messages for when the API wants to send a message to a specific resource ID and it can forward the message on by looking up the PID/node in the ETS table and sending it a message. Then, the user service can also use this dictionary service when initializing, it can check group PIDs by querying this service, and if the group GenServer isn’t started then the dictionary service can start it and respond back with the corresponding PID.

Sorry if this is super long, it’s really hard to explain properly what I’m trying to do. Please let me know if you have any questions, as some things might be unclear. I’d love to know what you think the best solution would be, and if I’m thinking about hash rings correctly.

Thanks for the help in advance!

Perhaps libring covers your needs?

Thanks! Been trying out this library - it seems that with using it, using my IDs it doesn’t resolve with nodes in a well balanced way. For example, I just generated 10 IDs, inserted 2 nodes (a & b) into the hashring, tried to call key_to_node/2 on them and 8 of the IDs resolved to the a node. Am I doing something wrong or is this expected behavior? If it’s expected then I can’t use a hashring like this because I need each node to have a somewhat balanced amount of GenServers running.

Are you setting your node weights or just using the defaults? Try setting each weight to 1 and i expect the distribution to be quite even.

Yeah, just tried with that and it was a lot more balanced. Thing is, my nodes autoscale - if I add a new node, then some IDs that previously resolved to a certain node will no longer resolve to that node, right? I need to make sure it’s consistent even when new nodes are added dynamically, which is why I thought hashrings weren’t my solution at first

Is this relevant for you https://github.com/uwiger/gproc/blob/master/README.md if not please ignore it, have to admitt i just briefed through you post.

Have you looked at Horde? Seems promising to me for this kind of use case.

1 Like

Also Phoenix.PubSub, maybe.

This is exactly what :pg2 is for.