In production, we have 3 application VM instances (load balanced, no distributed Elixir) and I seem to only see 1 of the instances listed in the oban_peers table.
My understanding is these should be refreshed ~30 seconds but the expires_at in our production database seems to be almost 2 days? Does this mean Oban is perhaps not registering all the peers? Is this only for leader election or could only one entry here also mean only one instances is working on the queue?
I also ask because we have some performance issues during a time window when we insert and process a large(ish) number of workflows. My theory is one node is doing most of the work (though not all according to logs).
Oban: 2.17.10
Oban_Pro: 1.4.7
We recently bumped from Oban 2.14.x and Pro 0.14.x
If I am not mistaken this DB row is the lock obtained by one node to be considered the leader of the cluster. So it is expected to have a single record here. But it should be updated 15 seconds ago indeed.
Your understanding is correct, it should only bump the expires_at for about 30 seconds. The time is computed on the server, is there a chance the clock on your production servers drifted?
The table isn’t named very well; it should be oban_leaders, really. It could prevent the current leader from releasing on restart though, because the old leader won’t expire for two days.
The peers table is only used for leadership, and leadership has nothing to do with working on queues. It is used for plugins, primarily.