I’m currently using Poolboy to manage a pool of AMQP connections. In my unit tests, I have some that kill connections to test the service comes back. However, Poolboy dishes out dead connections because the kill in the actual connection module takes a while to bubble up to the GenServers terminate function. I’m following the advice given for AMQP, but it just isn’t working out.
I can’t quite understand your setup but you can prevent poolboy giving the worker to the next process by monitoring the worker and waiting for the :DOWN. GenServer.stop of the worker will provide this guarantee. This works because poolboy is link’ed to the worker processes, and exit signals are sent before monitor signals. Therefore poolboy is guaranteed to have the exit signal in it is message queue before process receives the :DOWN. This means poolboy will handle the exit signal first and replace the worker.
That’s what I’m doing now. So, I have a pool which has many workers. Each worker is a GenServer with a Connection in its state. If I kill the connection and put the worker back, then manage to check it out again soon after, the worker hasn’t yet had its :DOWN handler called, and so the next process to use it crashes.
What would be really nice is if Poolboy checked for a particular function in the worker, such as an is_ready() function, where the server could check if the Connection objects PID was alive and return true or false. That would solve the issue, I think.
Would poolboy be making a call to the worker? If the worker is busy this blocks the pool waiting on a single worker. This could be prevented by having the worker do the checkin instead of the calling process. However then the worker needs to do the monitoring instead of poolboy, and the worker needs to know about poolboy, and suddenly everything is different. I have a library that follows a similar pattern: https://github.com/fishcakez/sbroker but by making some hard problems easier it also makes some easy problems harder. I would not recommend it unless it can solve a problem that a simpler solution can not.