Oban Postgres Peers

I am testing an architecture where I have 2 instances of the application using Oban with Postgres peering and I want to see how the application processes jobs in this scenario and go on to configure more queues and workers.

For now I have a staging instance with a database and I wanted to test by connecting a local instance to the same database, however the oban_peers table still has only one peer record - for the staging instance. I expected a second peer to be added to oban_peers but it is not happening. Is this how it should be, for instance only the leader to be saved in the peers table, or am I doing something wrong? The only thing I can think of that might be a problem, although I don’t know why, is that the local instance is in the dev environment and the staging instance is in a staging environment, but Oban is started in both environments.

I couldn’t find any details neither in the docs nor when I searched in the forum and other places.

You aren’t doing anything wrong; that’s how the peers table works. Only the leader holds a record in oban_peers. There’s some background and more details here: Oban v2.11, Pro v0.10, and Web v2.9 Released · Oban Web+Pro

I’ll enhance the documentation about peers to answer some of these questions (or a PR is welcome, if you’re interested :pray:)

1 Like

Thank you for the response! I had a thought that it might be like this. Your article was also really helpful and interesting.

As to updating the docs, I don’t mind creating a PR, but I’m not sure I understand everything well enough.

For instance, and this is a question that I’ve had on my mind as well, how does a job get assigned to the second node and not the leader, assuming there are 2 nodes total? Is it picked up by the node that first manages to acquire a lock, which with 2 identically configured instances I assume has an element of randomness, or there is some way that Oban distributes the work across the nodes? I’m if for this question is also answered somewhere, I couldn’t find it.

Leadership is only used for coordination by plugins (e.g. Cron, Stager). Queues operate independently across nodes. They aren’t “fair” and will process jobs as fast as possible while respecting the concurrency limit.

Does that mean that with 2 nodes (again, Postgres peers without a distributed cluster) both nodes will start processing a job?

Yes, provided they are connected to the same database, using the same prefix, and running the same queues.

I see. In that case, can the global peering through Oban.Peers.Global and an Erlang cluster be used to distribute jobs to be run once on a single node with its global locks?

Edit: Excuse me, additional quesiton - with Postgres peering can the leader be configured to process all jobs in order to avoid duplicate processing?

Jobs aren’t duplicated and each attempt can happen on any node, up to max_attempts.

Leadership has nothing to do with how/when/where jobs are processed. If you’d like to process jobs on a single node then you can start queues on just that node, but there wouldn’t be duplicate processing anyhow.

1 Like

But the attempt will happen on only one node? In other words, theoretically if I have 2 nodes one will pick the job up, the other won’t, but it might if it is retried?

What do you mean by “there wouldn’t be duplicate processing anyhow” if the queues run on all nodes?

I’m sorry for probably asking stupid questions.

That’s correct.

What I mean is each attempt only happens on one node. It isn’t a “fan out” situation where a job is executed on every node, which I’d consider “duplicate processing.”

1 Like

Oh, I see. I thought earlier when you said that provided the nodes are connected to the same database, using the same prefix, and running the same queues both of them will start processing a job, but as I understand it now you meant that each node will process a different job. Is that correct?

This sentence can be interpreted in a wrong way, but yes, only one node will handle one specific job, both nodes will take jobs from the queue.

1 Like

That’s correct. Sorry, I didn’t mean they would process the same job.

1 Like