I am running Oban in production and I am investigating a query that is constantly using 50% of my database’s CPU. This query is called approximately every second and the “clean” SQL is:
SELECT queue, state, count(id) FROM oban_jobs GROUP BY queue, state
I believe this query must be called from ObanWeb (which as far as I am aware uses Oban Met) but I have been looking at the code and I cannot find this exact query anywhere. I have found similar queries but all of them filter the state / queue to use optimizations depending on the previous values.
For instance, my oban_jobs table has about 2 million jobs (1.95 million in the completed state).
My versions are:
Oban 2.18.3
ObanWeb 2.10.6 (from the oban repo)
ObanPro 1.4.14 (from the oban repo)
And the database I am running is an AWS Aurora RDS PostgreSQL16 instance.
That’s a counting query used by oban_met. Counting is always expensive. That’s why newer versions will always fall back to estimation.
Do you need 2 million jobs? If not, pruning is recommended. Minimizing the size of the table would definitely help with the queries.
Many thanks for sharing your versions; however, we need to know which version of oban_met? Newer versions of met will do automatic estimation to prevent expensive queries.
Kinda . We are working on reducing the table size as we want to upgrade to Pro 1.5 in the following weeks.
Most of those jobs come from our workflowsqueue which cannot be pruned easily because it runs jobs that are Pro Workflows and we don’t want issues with dependencies when some of them are deleted. As far as I know we had some issues in the past (I was not part of the team at that time so I don’t know the details) so we added a 16d age on that queue as some workflows dependencies could be scheduled at most 16d in the future.
The oban_met version isn’t explictly declared on my mix.exs file. Looking at my lock file the installed oban_met version is 0.1.11 (dependency of oban_web).
Taking into account the number of records in my oban_jobs table I would assume that the optimization should be used instead of the exact count. However it does not seem to be the case.
Some extra details about our configuration: we use a kubernetes (not erlang/otp) cluster of N web nodes + a worker node and all nodes run the same application. The only difference is that our web nodes are configured with queues: false and peer: false so that there is a single Oban instance that runs our queues (that is the purpose of the worker node).
Since our worker node does not start the web server the oban web dashboard is only accessible from web nodes.
There’s an ignore_deleted option for workflows so that you can swerve this situation.
That’s absolutely new enough to have the estimate function! For some reason, the estimate function isn’t available? It may be that the application role that you’re connecting to the database with doesn’t have the permissions necessary to create a new function. It a hunch.