Do pause semantics in Oban queue configuration apply locally or globally?

Problem

I’m currently facing an issue where I have some Oban jobs that depend on Phoenix’s Verified Routes to build urls, which requires the Endpoint to be started. However, the Endpoint is started after Oban in the supervision tree. This creates a race condition on app startup where if one of these problematic Oban jobs executes before the Endpoint starts, it will fail. For legacy reasons, these jobs are extremely brittle and cannot be retried. Refactoring them is a good idea, but not within what I can accomplish in the near-term.

Solutions I’ve Explored

  1. I cannot move the Endpoint before Oban in the supervision tree, since Oban jobs can be inserted as a result of web traffic and that would also create a race condition, since Oban needs to be started to insert jobs.
  2. Build the URLs up by hand from the application config, bypassing the Endpoint functions. This has potential, but I do like the convenience of using verified routes, so I wanted to explore other avenues before resorting to this.
  3. Start Oban before the Endpoint, but run no jobs until the Endpoint has started.

Question

My question comes from researching what it would take to implement solution no. 3. Oban allows you to start queues in a paused state. This is promising, since I could start Oban with all queues paused, start then Endpoint, and then resume all the queues. What I’m seeking clarity on is if I configure my Oban queues with paused: true, does that pause apply only to the local node, or globally across nodes? I do not want to pause the queues of the other nodes that are up and running already.

Of course, if people have other solutions for how to resolve the root issue, I’d love to hear those as well.

Thanks in advance for any help!

That’s a great plan, and precisely why it’s possible to start queues in a paused state :slightly_smiling_face:

That only applies to the local node. It doesn’t broadcast a pause message to the other nodes at all. You’d also want to scope resuming the queues to the local node. Something like this, using a Task that executes last in the supervision tree:

resume_task = Task.child_spec(fn -> Oban.resume_all_queues(local_only: true) end)

children = [
  Repo,
  Oban,
  Endpoint,
  resume_task
]

Supervisor.init(children, ...)
3 Likes

Thank you for clarifying! I’ll give that a shot.