When a job fails, I thought to find the full error in the database. However, jobs that are discarded (e.g. in case of failure and max attempt reached) are being pruned. From the documentation:
Pruning is only applied to jobs that are completed, cancelled or discarded. It’ll never delete a new job, a scheduled job or a job that failed and will be retried.
I was wondering why that choice as we want to find the errors, but those disappear by the pruner?
I have such a case where something wrong happened, and I lost all trace of the error.
So first question: why isn’t there an option to keep discarded jobs because of failure in the table (and why wouldn’t that even be the default)?
Second question: how to best keep trace of errors happening in Oban workers?
I rather think that I didn’t get the base idea for handling failed jobs. Oban has been designed in such way for a reason, and is very mature already, so I do not think it’s just a mishap and a possible PR for improvement, but I think it’s rather a lack of understanding of the concept of Oban error handling on my side.
For me it seems natural to have an option to keep failed jobs in the database for debugging purposes, but apparently that’s not the choice made.
This plugin treats all jobs the same and only retains by time. To retain by length or provide custom rules for specific queues, workers and job states see the DynamicPruner plugin in Oban Pro.
alternatively, by plugin design of Oban, you could extend / make similar Pruner plugin which able to handle your custom behaviour.
You’re correct that this is by design. By default, each job has 20 attempts, and the standard backoff extends to 12 days. You can increase the number of attempts or tweak the backoff to extend that backoff period. If a job has failed 20 times over the course of nearly two weeks, chances are further attempts won’t succeed either.
After further reflecting on this, I thought that it might still be interesting to have a simple option to leave discarded jobs in the DB.
Some apps may not be actively monitored, and failed jobs may stay for several weeks in the db unnoticed (until max attempt is reached). When max attempt is reached, the jobs are lost forever. Maybe we really need to send that data to the customer, or whatever the job was supposed to do, but it’s just gone.
The jobs should never be the canonical store of business state. If an invoice or data needs to go to the customer, that need should be modeled and stored in your regular tables. You lose the reason the job failed with pruning, but you shouldn’t be losing the need or intent.
Ultimately though because oban_jobs is just a table, you can sort of do whatever you want here. You could not run the built in pruner and write / use your own pruner, you could copy failed jobs over to a different table at some interval, there’s a lot of options.