Barrier synchronization for oban jobs

I am using oban to create a group of tasks, all of these tasks are linked to the same parent. After all the tasks are complete, I want to add additional information to that parent that the tasks have completed, so a different status can be rendered to the user.

Additionally to this, there is no guarantee on the order and concurrency of executed tasks, some might fail and require a retry, the server might be restarted when a new deploy is done (hence why oban is used in the first place).

I was thinking on a few approaches:

  1. Exectue a query to check if other jobs are complete at the end of every job, if yes update the status. I do have my fears about this not working correctly in concurrent setting, maybe someone knows more on this topic;
  2. Have a background worker that will periodically check number of jobs completed and update the status, this of course implies some potential delays and queries executed, but for my project it is a viable solution;
  3. Monitor the tasks with a process, re-synchronize the process based on data from the database when the server is restarted.

If you have any better idea it would be great to hear a different perspective.

1 Like

Maybe the managers aren’t looking to pay but Oban Pro seems to have what you’re looking for: https://oban.dev/docs/pro/1.2.2/Oban.Pro.Workers.Chunk.html

Though I’m not 100% sure. If you add DB updating logic at the end of each job in the chunk, and have another job sniffing the database and waiting for the data structure to say “all related sub-jobs have completed successfully”, then it will likely achieve what you need.

Though now thinking of it, that could be achieved with Oban’s free version as well.

Sadly Oban pro is out of the question, there are no resources for such expenses, moreover it seems an overkill for such a simple feature.

So you are in favor of the solution 2? Makes sense as it would avoid having to deal with any race conditions.

hey, did you ended up with a working solution not requiring Oban pro ? That would help me a lot.

Thanks!

I never got to that, as that project got frozen for the time being, however implementing the solution I outlined as number 2 should be more than good enough if you don’t care about this happening instantly.

This can be as easy as creating a genserver that will query the database every N seconds/minutes and update the status, should be no more than a couple of lines of code that is foolproof.

ok thanks i’ll try that !

Your first approach can work too, as long as the “check and update” operation is synchronized among the many jobs.

Your fears are right. I used the 1st option and I am facing race condition issues. Came here looking for possible solutions too.

If you have Pro subscription, I believe this is the case for Workflow.Worker.

https://hexdocs.pm/oban/2.11.0/workflow.html

That looks like a good idea, however I have a concern where I tried using Batch in the past and dropped it because we handle our own retries. We wanted to retry jobs based on specific errors, not all, so we insert each retry as a new job. I believe I may run into the same issue here with Workflow.

The other issue I had with Batches (which according to the docs is the same for Workflows) Oban.Pro.Worker — Oban Pro v1.5.0-rc.9

If your process function returns an {:ok, value} tuple, it is recorded. Any other value, i.e. an plain :ok, error, or snooze, is ignored.

We needed to record errors as well.

2 Likes

Errors are recorded in the errors array on the job anyhow. So if you return {:error, error}, or raise, or crash, that’s listed in errors, but it’s not duplicated as a recorded value.

2 Likes

Hi, yes I’m aware. Unfortunately for our use case, we need the error value recorded as we save it in a different table to be used as part of our data, and it’s not feasible to be retrieving it by querying the oban_jobs table. The error message in the errors array is also parsed into a string message so we would need to re-parse our parsed errors in that case. However we do find the errors array useful for viewing the raw errors :smile: