1.5 Chain worker backing up in 'scheduled' state

Hey folks! We’ve switched to using the new chain worker, but we’re ending up in a situation where it seems like the jobs are all ending up in a 'scheduled' state with a schedule date all way in the future.

What’s going on with this? How can we unstick it? We’re at like 600k+ jobs in scheduling right now. For the moment I’ve been doing:

update oban_jobs set scheduled_at=now(), state = 'available', meta = jsonb_set(meta, '{on_hold}', 'false', true) where id in (
select distinct on (args['shipment_id']) id from oban_jobs where queue = 'firehose' and state = 'scheduled' order by args['shipment_id'], args['event_id'] asc
)

to basically get the earliest event for each shipment and force enqueue the job. This seems pretty hacky though.

EDIT: This doesn’t actually work all that well to unstick stuff. Not sure what’s going on but the runtimes on the jobs are quite high, (~2 seconds to do a ~40ms push to cloud pubsub) and the number of running jobs at any moment seems low.

EDIT 2: Forgot to remove the global partitioning config from the queue. Problem solved!

Hey ya! Wen Bilson! :wink:

This is akin to a test that a cheeky teacher gives where the final question is:
Don’t answer a single question on this test.

SO glad you solved your issue! Global partitioning and the interplay with that is def the issue.

Hey @sorenone!

While this did generally clear out the queue, we’re still getting a bit of a backup for reasons that aren’t clear to me. Is there any way to query, given a specific job that is in this on hold / scheduled in far future state what it’s waiting on?

It is possible to query it. I don’t want to say, “it could be a bug..” @sorentwo :eyes:

It can happen from race conditions but it shouldn’t be a frequent occurrence. Will investigate.