How many connections does Oban need?

Hi. At work we are running an application with a lot of web traffic and background jobs. Some time ago we split the Oban database from the regular application database.
There are four web nodes and a separate worker node. Oban jobs are only processed by the worker node.

The issue that we are facing is that we get the following error frequently:

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2999ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information

As per AppSignal, this error is always originated in lib/oban/stager.ex:86.

The thing is, our Oban repository is configured with pool_size: 4 and we have three queues. I seem to remember reading somewhere that Oban uses a connection per queue plus another extra connection to listen to PG notifications.

In that case 4 connections should be enough (3 for the queues + 1 for pubsub notifications), but this error that we experience makes me thing otherwise.

Anyone knows how many connections does Oban use for internal operations?

EDIT: thinking about this further, my understanding is that the stager is also running in the four web nodes, even though they don’t run any queue. I see that the stager does nothing if the peer is not the leader, but still connects to the database to perform a noop transaction. I’ve seen that the stager won’t be initialized if we configure stage_interval: :infinity. Would this be an appropriate configuration for the web nodes?

2 Likes

Which Oban version are you using? The recent v2.15 release has query improvements that significantly speed up job staging for larger tables.

Oban shares connections from the pool and one additional dedicated connection for the Postgres notifier. It doesn’t matter how many queues or plugins you run; Oban doesn’t take more connections.

You’re welcome to disable staging on web nodes by setting the interval to :infinity. However, it seems like you have some slow queries monopolizing the connection pool if a noop transaction can’t grab a connection.

1 Like

Which Oban version are you using? The recent v2.15 release has query improvements that significantly speed up job staging for larger tables.

We are currently using Oban 2.14.2. I’ve took a look at the 2.15 release notes and it looks like the changes may be helpful. Will update and see if the problem is resolved or mitigated.

Oban shares connections from the pool and one additional dedicated connection for the Postgres notifier. It doesn’t matter how many queues or plugins you run; Oban doesn’t take more connections.

You’re welcome to disable staging on web nodes by setting the interval to :infinity . However, it seems like you have some slow queries monopolizing the connection pool if a noop transaction can’t grab a connection.

Sorry but I’m not sure that I understand this. The issue is happening in a repository that points to a different database and is only used by Oban. The application code uses a different repository, with a different connection pool and pointing to a different database. Is it possible that the application operations using MyApp.Repo could affect MyApp.ObanRepo as well?

Right, I understood that. My point is that some slow queries are monopolizing connections, which implies that the pool size is too small or the database can’t manage the load.

If you identify slower queries or bottlenecks, I’m eager to help optimize queries/connections.

Oh that makes sense! I’ll update Oban to 2.15 first and then increase the pool size, as 4 seems too low. If the problem still persists I’ll try to identify the more impactful queries to see if they can be optimised.

Thanks! :bowing_man:

Updating Oban improved the situation a bit but we still had errors here and there. At the end we ended up increasing the pool size as we found out that the stager errors happened at the same time as a heavyweight job was running (it inserted about 50k jobs itself).

1 Like