Hi everyone,
I’m posting because I assume this must be a common issue, but I’ve been going in circles and haven’t
found a clean, reliable pattern yet.
I run a fairly simple Phoenix app on Fly.io with a rolling deploy setup for two machines: one web, one worker for Oban/cron-heavy background work.
The database is Supabase Pro Postgres.
Each app machine has a fixed DB pool (web=10, worker=15), and the app works fine during normal operation.
The problem appears on deploys, especially consecutive deploys:
- new machines start
- old machines are still around briefly
- old idle DB connections from the previous release do not get cleaned up for a considerable time
- connection count spikes and can stay high enough that Postgres starts rejecting new connections
On Supabase I then see errors like:
- remaining connection slots are reserved for roles with the SUPERUSER attribute
- sorry, too many clients already
My current solution is:
- run a delayed post-deploy DB hygiene task from the worker node only
- wait about 3 minutes after startup
- then terminate stale DB sessions that are idle / idle in transaction and older than 2
minutes
That is working better than trying to do cleanup in Fly’s release_command, because
release_command runs too early, before old Machines have fully exited and before their sessions are
actually stale.
But I can’t help to think that I must be doing something wrong? Is this a common issue?
Thanks for your help!






















