I’ve set up a simple phoenix app with presence module. I’d like to send a RabbitMQ notification each time a socket disconnects. This is straightforward when nodes are up and running. However this doesn’t seem that easy when there’s a problem with a node.
For example, lets say I have 3 nodes and there’s User A on Node 1. Now Node 1 dies and the User A does not reconnect. I could monitor “leave” messages in other nodes and send the notification to RabbitMQ but that means both nodes 1 and 2 will send this notification. I’d like to be able to send it only once.
Any suggestions how to do it?
You can have a single node start the rabbitmq subscriber(s) on the topics you care about watching leave events for. This could be a dedicated node, or conditionally started the children on Node 1 or Node 2 in your example, but not both.
Doesn’t adding one dedicated node mean that there’s a single point of failure now? E.g. if this dedicated node goes down then it can miss events and not forward them?
Yes, but it depends on the guarantees you want and the scale you need. The rabbit service itself could be considered single point of failure, no? (if running on a single server) So running the rabbit Elixir process alongside the same node running rabbit would be completely fine imo if it meets your scaling needs. It’s also the least complex solution. Otherwise you need to handle fail-over, or do sharding against a fixed cluster size, or start answering questions on the guarantees you want, with different solutions based on your answer.
So to start figuring this out, what kind of scale do you require? Also what’s the usecase for sending these messages into rabbit? If we know more about your usecase we may be able to say which way to go based on the guarantees you need. For example, you could have the local-nodes send the rabbit message on channel pid DOWN, but as you alluded to you will miss the message if the node goes away ungracefully. This may or may not be okay for your usecase.
Thank you for your response.
We have around 10K-100K concurrent online users. There are couple of other services that need information about small set of users leaving the system. These other services are not all built in erlang.
One of the services is a 1-1 chat service (simplified). We need to end the chat session and trigger some callbacks (webhooks) when one or both of the participants leave. Leaving a lingering non-closed session is not an option. Here we probably could also use polling to sweep lingering sessions.
The other service is more complex. It’s a user queuing system. We need to remove the user from the queue when the visitor leaves the system. This also has to happen when a presence node which the user was connected to dies ungracefully and user does not reconnect.
Using rabbit is not a requirement. The requirement is that all sessions are properly closed.