Oban cluster stuck with no leader

We attempted to update Oban Pro to 1.5, but something happened during the deploy and we were stuck with a non-existent node in the oban_peers table. We tried several things including restarting the cluster to get a new leader elected, but it never updated. We eventually deleted the stale node from the oban_peers table, hoping the nodes would then elect a new leader. That has not happened. How can we recover from this state?

Sharing the solution as discussed in Slack:

We finally figured out the issue but have no idea on root cause. Somehow the name column in oban _peers was not a primary key column anymore and therefore did not have a primary key index (unique index), so the [conflict_target: :name, on_conflict: :nothing] was failing because the conflict target column did not have a unique index to utilize for the upsert of the leader.

Making it a primary key column again and initiating a re-election caused a new leader to be chosen and everything to start working again

1 Like

We figured out the core problem and fixed it, but we are still not sure how we got into this state in the first place. We had 2 deploys that performed migrations on the day in question, one of which was the update to Oban/Oban Pro/Oban Web/Oban Met. Both migrations were successful with no issues. However, somehow our oban_peers table ended up without a primary key index on the “name” field. When election events were fired, the lack of a primary key caused it to fail, so no node was considered the leader. To fix it, we just added back the primary key index and manually fired off an election event with Oban.Notifier.notify(Oban.config(), :leader, %{down: inspect(Oban.config().name)})

Edit: sorry didn’t see this had already been posted.

2 Likes