We have transaction requests that can take 30-70 seconds and aborting them is a “very bad thing” but we need to be able to deploy with zero downtime and ideally we can detect when phoenix is done serving traffic so that we can shut down the old version in the case of blue green deploys or move on to the next server in the case of rolling deploys.
Does anyone have an example of gracefully shutting down phoenix (without aborting existing request). In other words, drain stopping phoenix. At the moment, if you trigger a shutdown with, say, exrm's stop command, any existing long running request will be aborted.
Any advice and/or examples on zero downtime deploys (besides hot upgrades, we’ve tried them and they require too much overhead for continuous deployment and there are enough cases where they just do not work yet) would be much appreciated.
Are those 30-70 seconds transactions being performed straight from Phoenix workers?
There is a somewhat nice solution that uses not one but two background job queues. Basically when a job request comes in, you throw it on a intermediate queue. You have a background job worker that takes job from intermediate queue, and puts it on second, working queue. Second worker takes jobs from the work queue and processes them, removing from the work queue when it’s done.
Now, if you want to gracefully shut down your app, you first shut down your intermediate queue worker. New job requests come in and land on intermediate queue as previously. But nothing moves them to working queue.
In the meantime, the worker takes tasks from working queue, removing them one by one. You can safely shut down your app when working queue is empty.
That’s an interesting idea. At the moment they’re actually synchronous web requests, so we need a way to tell phoenix to stop receiving requests. We’d also need a way to know when all of the pending requests in phoenix finished. We’d need this regardless of whether or not we had a queue or queues in place to do the actual work.