Hi!
This weekend I experienced jobs accumulating in a state available
across all queues. We run a two-node setup. One of them was the leader while jobs were accumulating (checking with Oban.Peer.leader?/1
), and none of the queues were paused (checking with Oban.check_queue/2
). All Oban config is fairly default (the :peer
option is not set, two plugins: Cron
and Pruner
).
Unfortunately I had no time to investigate. Respawning two new nodes (we’re on k8s) solved the issue and jobs were being processed again.
I want to prevent this in the future, but I’m not sure what I can do next time to get to the bottom of this. I checked the troubleshooting guide, but I don’t think there is something in there for this situation (I’m not using PgBouncer for example). I read something about the Stager modes (local
vs global
), but I’m not sure how to query for that (and if it’s useful information at all).