Robust against deploys with single retry - Oban or Task?

If you are going to retry only once within a short time window, in order to handle transient failures while providing quick failures to front end, is Oban still necessary or overkill, if you want to the application to be robust in the face of frequent redeploys?

I’m not sure this question can be answered with the detail provided. It depends on how you are queueing the task and what triggers the task itself.

For example, if you have a frontend making an HTTP request that triggers this background job, and you want the job to be resilient to redeploys, you have a few options:

  1. Use Oban as you mentioned. This adds persistence so that jobs are sourced from the database and not kept in memory.
  2. You can ensure that you give your application a shutdown window where it can “drain” and finish ongoing jobs before shutting down. This is feasible and nice to do if the jobs are quick to execute. It comes for free if you spawn these jobs under a supervisor, for example.

However, note that option 2 does not guarantee that your jobs will be executed. Redeploying shuts down your application gracefully (usually), but your machine could still shut down at any point in time due to external factors. The BEAM could crash as well. Your datacenter could catch on fire. So if you absolutely need the job to be executed, storing it somewhere is probably a good idea.

2 Likes

Receiving from http request… Upon further thought, I am considering simply waiting for a short period synchronously and attempting retry directly, rather than using a task .

Yep, if that’s feasible then it’s a good way to go in order to avoid a whole class of problems :upside_down_face:

Can you increase the time for supervised processes to be drained when shutting down? Let’s say a supervised task takes about 30 seconds on average to complete, I assume it will be interrupted anyway? But given that I want to wait up to 30 seconds to let the tasks be drained, can you configure that, and at what level? Is there a default timeout?

In addition to the sage advice @whatyouhide already provided, I’ll add that there are other reasons why a Task may not be ideal: concurrency controls, scheduling, distribution, instrumentation, visibility, etc. The article Oban Starts Where Tasks End has more details.

3 Likes