Make Oban pick jobs in random order from the queue

pneri1 · July 12, 2021, 8:38pm

I’ve read the Oban official documentation and couldn’t find anything related to that. They mention that we can change the order by changing the priority attribute, but that doesn’t suit my case.

In my application, the user uploads a .xml sheet that generally has around 1.3k rows. For each row, we dispatch a new Oban job. The problem happens if a user submits at 09:00 AM and another user submits at 09:01 AM the second one will have to wait until all of the first one jobs complete to have its jobs processed. As the jobs take around 5-10s to complete, it means that we could make him wait for a time longer than we wanted.

As most of the uploaded sheets have the same size I thought that if we could pick jobs from the queue in a random order they would be virtually simultaneously.

Am I missing something? Do you guys suggest another approach?

m0rt3nlund · July 12, 2021, 8:49pm

Parallell processing instead? Why serial?

pneri1 · July 12, 2021, 8:51pm

I forgot to mention. They are already being processed in parallel, but all the processes always pull from the queue in the order they were inserted. So the last user that inserted keeps being punished

fuelen · July 12, 2021, 8:54pm

The simplest way is to put them into different queues

sorentwo · July 12, 2021, 8:58pm

If you need to keep them all in the same queue then a somewhat simple approach to make selection fairer would be to randomize the priorities. As priority trumps scheduled time it would interleave the jobs between the various insert times. Something like this:

args
|> Enum.map(&{&1, priority: Enum.random(0..3)})
|> Enum.map(fn {args, opts} -> MyApp.Job.new(args, opts) end)
|> Oban.insert_all()

pneri1 · July 12, 2021, 9:00pm

This idea came into my mind, the problem with this one is that Oban queues need to be previously configured and I can’t create/delete them in run-time. On the other side, the number of users is dynamic and it’s constantly changing, I can’t predict.

I could make something like a round robin for n given pre-defined queues. But it wouldn’t scale

pneri1 · July 12, 2021, 9:01pm

It makes sense!!

I’ll give it a try. Thanks

fuelen · July 12, 2021, 9:03pm

Actually, you can. Oban — Oban v2.17.1

But I’m talking about predefined queues. Like, for files with < 100 rows, for files with < 1000 rows, etc.

pneri1 · July 12, 2021, 9:06pm

Thats great!! I totally missed this option. There is no need to keep all of the jobs in the same queue, guess I could just spam and kill queues depending on the demand.

And now I get the ideia of multiple predefined queues. I’ll give it a try too.

Thanks a lot!!

NobbZ · July 16, 2021, 7:30am

Randomising wouldn’t probably help at all.

If you have 10 workers and an 11th job gets added, then it can’t be processed regardless of its priority/randomness before one of the workers finished it’s current task.

And if you now add a 12th, 13th and 14th job and process them in random order, then the 11th could potentially be even after 12, 13, and 14, resulting in 11 having to wait even longer than with in order processing!

sorentwo · July 16, 2021, 2:54pm

Randomizing wouldn’t be nearly as helpful as using multiple queues, but it would make processing between multiple accounts fairer. As a simple example, imagine you have three accounts a, b, c, and they each insert three jobs with a randomized or rotated priority (ignoring scheduled time):

id	account	priority
1	a	0
2	a	1
3	a	2
4	b	0
5	b	1
6	b	2
7	c	0
8	c	1
9	c	2

Based on the priority sorting, Oban will process jobs in this order:

id	account	priority
1	a	0
4	b	0
7	c	0
2	a	1
5	b	1
8	c	1
3	a	2
6	b	2
9	c	2

The result is somewhat fair intermixed processing between the multiple accounts.

That said, using multiple queues in some fashion is a much better solution!

paulstatezny · July 16, 2021, 3:28pm

Why does each job take 5-10s to complete? More specifically, is your server crunching a bunch of data? Or mostly sitting around waiting for an external service like an API or database to complete some work?

If the former, it sounds to me like you’re doing an immense amount of work and the only solution is more hardware. If the latter, it sounds like increasing the amount of parallel jobs per queue would speed things up — thankfully without bogging down your server due to the concurrency properties of BEAM.

Either way, you have a traffic jam and it sounds like the only real solution is to widen the roads.

pneri1 · July 16, 2021, 7:13pm

Thanks again for all the answers!! Its my first post here and you guys have been very helpful

@NobbZ, what @sorentwo said contemplates everything I want. The point is not the parallelism nor accelerating the process it self, but making the process fairer.

The example he gave us is the happy way. We have made some tests in production and it doesn’t happen perfectly, but now if a user a starts at 10:00, a user b starts at 10:00:01 when the first reaches 50% the second is somewhere around 45~55%. Without the random priority, user b would have to wait until a reached 100% at 0%.

@paulstatezny they take 5-10s to complete because they are fetching an API. We actually tried to increase the number of parallelism but the server we’re requesting died () so unfortunately we are at our limit here. Our next step is to cache part of the data we’re fetching and spread the jobs around multiple queues because there is a scenario where random priority will cause a problem: User a submits a job at 10:00, and then a user b submits at 10:01 and it goes on for 26 minutes. All of the jobs with low priority of user a, b, c, d… would have to wait until the jobs with higher priority of user z finished. Fortunately this doesn’t happen now as our user base is not that big, but we have this case in mind.

wanton7 · July 16, 2021, 11:08pm

Am I missing something?

I think I’m missing something or maybe I’m just too sleepy but why would picking them in random speed things up? If you example interleaved all jobs from 10:00 and 10:01 then 10:00 user would basically have to wait until both are finished because they are not running in parallel. So instead of 26 minutes both would have to wait close to 52 minutes. It would start to pile up if more users would start uploading.

I would advise you to rethink about your design a bit. Maybe just show users that their uploaded file is queued for processing and show them queue position or if processing has started? Also amount of API calls you do if you really do API call for every row sounds a lot.

paulstatezny · July 17, 2021, 5:09am

@wanton7 is right. This external service that can’t handle your needs is a bottleneck. Does your company control this API? If it’s a service you’re paying for they should be able to scale up to meet your needs.