@sorentwo or anyone really I may need some help with Oban.
My set up:
oban 2.19.4
oban_pro 1.6.2
oban_web 2.11.1
Since couple of days, some jobs are getting stuck in “scheduled”. I think this is related to amount of jobs scheduled in particular queue. I have currently 3.5k jobs “scheduled” and most of their scheduled_at time has already passed, but they have not executed.
In fact, they did not move from “scheduled” to “available”, and when I click “Run now” in Oban Web, they move to “available” but do not get executed either.
Other queues are executing just fine.
I have already tried reindexing oban tables, but the jobs are still stuck in “scheduled”.
Any ideas much appreciated.
My Oban Config:
defp oban_config do
[
log: false,
repo: DB.Repo,
engine: Oban.Pro.Engines.Smart,
plugins: [
{
Oban.Pro.Plugins.DynamicPruner,
state_overrides: [
cancelled: {:max_age, {5, :days}},
completed: {:max_age, {5, :days}},
discarded: {:max_age, {7, :days}}
]
},
{
Oban.Pro.Plugins.DynamicCron,
timezone: "America/Chicago", crontab: []
},
{Oban.Plugins.Cron,
crontab: [
#/ 10 crontab entries removed here/
]},
Oban.Pro.Plugins.DynamicLifeline,
Oban.Plugins.Reindexer
],
queues: [
my_queue1: [limit: 3]
#/ 20+ queues with limit 1-10 below/
]
]
end
This has happened yesterday on one queue, and I rescheduled the jobs (again around 3.5k) by inserting them in smaller batches, now it’s happening with a different queue with about the same number of scheduled jobs.
I suspect something breaks with high number of jobs, but honestly 3.5k is not that much.
Related: I managed to “fix” the issue somehow but that’s one time. Not sure why it worked.
What I did, is the following:
- From psql, I updated the jobs queue to a different queue. This did not work, they were not executing.
- From psql, I updated their status to ‘available’. This didn’t help either.
- Then another job that normally executes on the other queue was inserted. This triggered all of these availalble jobs to be executed.
Maybe I have something messed up with pg_notify or similar, but I thought these queues were supposed to do polling too.
I also have quite a few of these in logs on database:
“duplicate key value violates unique constraint “oban_jobs_unique_index””
BUT I suspect these are expected if one uses unique jobs (?). I have thousands of these.