OBAN -> action on job becoming "discarded"?

silverdr · April 24, 2025, 9:45pm

Running OBAN, I’d like to be able to perform some actions whenever the number of attempts is exhausted and job is being transitioned to “discarded” state. So far I came up with a workaround in the Worker like:

increment the desired max_attempts value by one
implement perform() twice, first with a guard when attempt < @max_attemps, second w/o any guards but with unconditional {:error, :max_attempts_reached} return.

This way the number of actual attempts remains as before. All “real” attempts are being handled by the first perform(), while the additional attempt is handled by the second perform(), doing what I want to do before returning the final, unconditional error tuple. IOW - this kind-of works but surely has some drawbacks and also kind-of smells to me. I also find it unlikely that it’s only me who has a need to act on this important state transition so someone most probably has found better ways to do it. Am I right? Any suggestions?

D4no0 · April 24, 2025, 10:34pm

Have you tried handling manually when the job should be discarded instead of using the oban functionality?

madclaws · April 25, 2025, 10:51am

If you are using Oban Pro, Batch provides handler for discard state, although idk if its an overkill.

silverdr · April 25, 2025, 4:05pm

I am not sure how that could be done, unless you mean implementing my own queue handling etc. Just… what would I then need Oban for?

silverdr · April 25, 2025, 4:13pm

Ah, there they hid it!

Use a batch when you need to:

[…]
Respond to specific events like failures or cancellations

So there is actually something like what I need. Although it talks about “batches” I still guess a “batch” may also consist of a single job, have batch_discarded/1 implemented and voila!

sorenone · April 25, 2025, 4:31pm

Love this community and the suggestions!

You don’t need to use a batch, there are worker hooks.

silverdr · April 25, 2025, 11:46pm

Thank you. From what I see these are also available only in “Pro” so I take the “OSS” version doesn’t have anything that could be used instead of my half-baked quasi-solution

benwilson512 · April 26, 2025, 2:36am

Depending on the action you’re taking, you could also use the telemetry hooks to do something similar. I would only do this though if the actions you’re taking are like “Log a special error message”.

Overall though I don’t think your solution is particularly bad.

pdgonzalez872 · May 9, 2025, 9:43pm

@sorenone, this is tangential (happy to open another topic if you’d like).

We sometimes get these jobs that are left in a moot state, where:

the status is discarded
there are no errors in errors
max_attempts == attempt (2 == 2 for example, in our case, we bumped it to 2 after having it as 1 for a while)
worker hook (after_process/3) doesn’t seem to fire (we are diagnosing what could trigger this, we think the container gets nuked before reaching it somehow)

I was wondering if you have seen this before in the wild?

sorenone · May 10, 2025, 6:26pm

Yes, we have seen this, often!

There is also an option in Pro’s DynamicLifeline that will rope them back to the runable state to run again. If you want to be sure that these run, you have to increase the number of attempts or how long the system can run before shut down.

silverdr · May 16, 2025, 9:13pm

Thank you but while it works in a way, it still has a number of drawbacks. The most visible ones are:

The number of actual attempts does not match the number configured in the job. While not a huge problem it still doesn’t feel right.
Once the job is discarded and let’s say I found the cause of the problem and fixed it, I cannot “retry” the failed job w/o manually intervening on database level first. Definitely not something I’d be happy to do in production environment.

sorenone · June 2, 2025, 9:33pm

The Oban.Plugins.Lifeline — Oban v2.19.4 isn’t on by default. That’s how it works.

True, you will have to get in there and do it manually. So, you can but it is manual or via the dashboard. Although, I don’t understand why that is an issue. Would you please clarify why doing so manually is an issue? It will help us going forward.

silverdr · June 4, 2025, 12:09am

I think you misunderstood my post. I was not responding to the question about “jobs left in a moot state”. I was referring to my workaround for the lack of the hooks you mentioned, in OSS version. Once the job is discarded (for a valid reason), I can’t do anything more with it from the dashboard. I need to update job’s fields directly on DB level. Yes, I could prepare some script(s) to avoid catastrophic typos or so. But still - that’s a production DB. Feels like wandering around production server, trying things in a #-propmted shell

sorenone · June 4, 2025, 1:15pm

You can and should try discarded jobs through the dashboard. You can also do it with Oban.retry_job/1, from the console, without touching the prod DB.

Which version of Web are you using? Do you see the retry option in the jobs header or details page?

Thanks for taking the time to clarify and help me understand.. It helps refine the flow and experience for others.

silverdr · June 7, 2025, 10:06pm

Of course, and with pleasure. Please check the very first post in this thread. This should explain. There is nothing wrong with my Oban Web interface. The “retry” options are there, etc. The problem lies with how my workaround to the original issue works, and I was explaining one of the reasons why I am not overly happy with it. Simply any attempt beyond the iniial “real” ones returns {:error, :max_attempts_reached}.

Now, having said that, I think I might be able to somewhat improve it by guarding differently. Like against attempt being exactly the expected max_attempts + 1. Need to get back to this.

sorenone · June 10, 2025, 12:38am

That does help!

Then that’s a byproduct of the way you chose to work around the problem. There are alternatives available, paid or otherwise, as suggested in this thread You’re right.

Retrying through the Web interface, or with Oban.retry_job will already increment the max_attempts. The database constraints actually make it impossible to have attempt greater than max_attempts, so that’s a requirement.

Thanks for taking the time to help us understand!