I have a few suggestions based on what you’ve said so far, and I hope they help. Reading through your description, you are first sending all messages to a GenServer, Rabbit
, which is responsible for maintaining a connection to a RabbitMQ server and acting as a message buffer for the upstream service. You may have done this intentionally, but this raises two concerns for me:
- Your message buffer is directly associated with the health of the connection. If the connection encounters an error, you lose your message buffer.
- You’ve serialized the message dispatch to Rabbit; this creates an artificial bottleneck in your code that you don’t need unless you must send the messages in the order they were received by the GenServer (which is unlikely given that it would be difficult to guarantee their order in the first place)
To put this more bluntly, Rabbit
has two responsibilities in its state: the connection and the message queue.
The Connection
behaviour is specifically designed to help manage a connection to a remote service, and the documentation is very useful in understanding how to start a connection when the process is started (by returning {:connect, info, state}
inside of init/1
). This is a good use of a GenServer process because the connection and connection state are held in the GenServer state. The GenServer can then give informative responses to callers.
However, I don’t see any reason in your description to have a buffer. I would remove the buffer part completely and have the processes handling the HTTP processes responsible for calling Rabbit
. Because Rabbit
uses Connection
, even when the connection to the Rabbit server isn’t available, it can handle messages from callers and inform them, with a response like {:reply, :noconnect, state}
. The caller can then make a decision about whether to retry or inform the downstream client that there was an error.
The model I described above has a few benefits:
- There is no longer any message queue to deal with, so you aren’t dealing with the potential to “lose everything” that was in a GenServer state because of minor errors
- The success state is pushed farther downstream, towards the client, and the client is now in agreement about the state of the message. This empowers the end user to make a decision to try again or not, and it also avoids a situation where the end user was informed something succeeded only for it to fail silently somewhere else.
For the model I described above, in the supervision tree, Rabbit
should be at the far left, and the HTTP processes should be to the right of Rabbit
(possibly as children of an HTTP process supervisor). This ensures Rabbit
is started and allowed to connect before HTTP requests are handled. Also, in the case that Rabbit
suffers irrecoverable issues and isn’t able to maintain a connection, this will force the HTTP processes to restart.
The model above still suffers an inherent flaw, though: the GenServer will only processes messages serially, so all HTTP connections will have to share the single Rabbit
connection. If your HTTP requests pile up faster than your connection can handle them, the call
s will timeout. As your service grows, you may instead require a pool of Rabbit
connections, in which case the RabbitPool
would be at the far left in the same manner as described above, but the HTTP processes must now check out a Rabbit
connection from the pool before performing operations. This allows you to service multiple HTTP requests concurrently.
Overall, I think that would move you towards a more robust service and avoid headaches about losing state. I may have missed something in your description that makes what I described above inadequate, though. If that’s the case, let me know what it is, can I can try and provide a better answer.