Handling Oban insert errors when database resources are stretched

swelham · June 10, 2021, 10:51am

Does Oban offer support for recovering jobs that fail to get inserted when using Oban.insert or Oban.insert_all?

For example, say I have a list of ten thousand items which I am inserting in chunks of 1000 (to be faster and use less db connections), as soon as the first chunk is inserted Oban will start processing those jobs. Each of these jobs needs to make a database request, so at this point all of my db connections (say I have 100 in the pool) are in use. This means my next insert_all is going to be put in a queue waiting for a db connection. In the case where the request for a connection times out the insert_all will error and the chunk of jobs will never get inserted.

Our current background processing library (faktory_worker) handles this for us by making the pushing (inserting in Oban) of the job asynchronous and retrying the push when something goes wrong. In the example above, this would also allow the caller to continue processing the chunks ensuring none of the jobs are lost.

From what I can tell, it’s expected that the error handling for this case is left to the caller. If so, I would be interested if there any approaches anyone else has taken to handle this?

sorentwo · June 10, 2021, 7:40pm

No, there isn’t Sidekiq Pro style “reliable push” in Oban. Typically you’ll insert jobs in a transaction, often with other related records, and you want to keep those transactional guarantees.

Assuming that you’re inserting and processing jobs on the same node, that is possible. For that particular situation, I suggest reducing your concurrency or scheduling jobs a few seconds in the future. By the time the jobs are ready to execute your batch inserts are finished.

I consider async job insertion to be an anti-feature. It’s impossible to prevent a node from shutting down, which means you have no guarantee that your jobs are ever inserted. Two principal goals of Oban are reliability and observability. With synchronous inserts you know with certainty that a job made it into the database, and you can easily observe that it is in there.

swelham · June 15, 2021, 7:26am

Thanks for the detailed breakdown.

I think I was approaching it with the incorrect mindset and trying to use Oban like a service (how we do with Faktory) and not seeing it as just interacting with the DB.