Possible to limit number of active workflows in Oban Pro?

We have a workflow consisting of a linear graph (for now), every job runs in the same queue which has a limit of 2.

So simplified the workflow looks like this.
Start Machine -> Run job 1 on machine (depends on start machine) -> Run job 2 on machine (depends on job1) -> Run job 3 on machine (depends on job2) -> Terminate machine (depends on job3)

Every worker has the same priority except the one starting a new machine, it has a lower one, the reason for this is that then all other jobs in the workflow should be executed before another machine is booted, or so I thought.
Most of the time the workflow works as expected, I can see a lot of “starting machine” jobs in the available queue and all the tasks in the scheduled. But after a workflow is finished, sometimes one “start machine” job gets triggered, and then that workflow is paused while another one runs to completion, which means a machine is just running idle for a long time.

So to summarize, the queue has a limit of 2 but we still can have more than 2 machines running, even though everything is in a workflow and priority of the machine starting jobs are lower.

I can kind of guess why this is happening, but curious as to if there’s a way to limit the number of active workflows or other ways of tackling this. One a machine is running that job is finished so we can’t keep them in a separate queue as a way of blocking.

oban 2.18.3
oban_pro 1.5.0-rc.4

One mechanism for limiting the active workflows would be to use a global limit and partition by a tenant id. The tenant_id would be set for each workflow like the workflow_id, but it’s not possible to partition by meta currently). That could look like this, using a local limit of 2 and a global limit of 1, so two separate workflows can run at the same time but only 1 job in each workflow:

my_queue: [local_limit: 2, global_limit: [allowed: 1, partition: [args: :tenant_id]]

Ultimately you either need to run workflows on a pool of machines or isolate each machine to only run a single queue. Some ideas for other possible ways to handle this:

  1. Consider using FLAME to dynamically start and stop the external machine. Each job in the workflow would make a FLAME call and they’d naturally pool to use the correct number of machines.
  2. Use a per-minute cron job to check whether the queue is empty and shut down the external machine when there aren’t jobs to run. This isn’t as elegant, and prone to race conditions.
1 Like

Thank you for the detailed response, for now I will live with the sometimes extra machine during a transition, it will be cheaper than having another machine always on for workflows right now :slight_smile: