OBAN -> action on job becoming "discarded"?

Running OBAN, I’d like to be able to perform some actions whenever the number of attempts is exhausted and job is being transitioned to “discarded” state. So far I came up with a workaround in the Worker like:

  • increment the desired max_attempts value by one
  • implement perform() twice, first with a guard when attempt < @max_attemps, second w/o any guards but with unconditional {:error, :max_attempts_reached} return.

This way the number of actual attempts remains as before. All “real” attempts are being handled by the first perform(), while the additional attempt is handled by the second perform(), doing what I want to do before returning the final, unconditional error tuple. IOW - this kind-of works but surely has some drawbacks and also kind-of smells to me. I also find it unlikely that it’s only me who has a need to act on this important state transition so someone most probably has found better ways to do it. Am I right? Any suggestions?

4 Likes

Have you tried handling manually when the job should be discarded instead of using the oban functionality?

If you are using Oban Pro, Batch provides handler for discard state, although idk if its an overkill.

I am not sure how that could be done, unless you mean implementing my own queue handling etc. Just… what would I then need Oban for? :wink:

1 Like

Ah, there they hid it!

Use a batch when you need to:

  • […]
  • Respond to specific events like failures or cancellations

So there is actually something like what I need. Although it talks about “batches” I still guess a “batch” may also consist of a single job, have batch_discarded/1 implemented and voila!

Love this community and the suggestions!

You don’t need to use a batch, there are worker hooks.

1 Like

Thank you. From what I see these are also available only in “Pro” so I take the “OSS” version doesn’t have anything that could be used instead of my half-baked quasi-solution :wink:

Depending on the action you’re taking, you could also use the telemetry hooks to do something similar. I would only do this though if the actions you’re taking are like “Log a special error message”.

Overall though I don’t think your solution is particularly bad.

2 Likes

@sorenone, this is tangential (happy to open another topic if you’d like).

We sometimes get these jobs that are left in a moot state, where:

  • the status is discarded
  • there are no errors in errors
  • max_attempts == attempt (2 == 2 for example, in our case, we bumped it to 2 after having it as 1 for a while)
  • worker hook (after_process/3) doesn’t seem to fire (we are diagnosing what could trigger this, we think the container gets nuked before reaching it somehow)

I was wondering if you have seen this before in the wild?

Yes, we have seen this, often!

There is also an option in Pro’s DynamicLifeline that will rope them back to the runable state to run again. If you want to be sure that these run, you have to increase the number of attempts or how long the system can run before shut down.

2 Likes

Thank you but while it works in a way, it still has a number of drawbacks. The most visible ones are:

  • The number of actual attempts does not match the number configured in the job. While not a huge problem it still doesn’t feel right.
  • Once the job is discarded and let’s say I found the cause of the problem and fixed it, I cannot “retry” the failed job w/o manually intervening on database level first. Definitely not something I’d be happy to do in production environment.

The Oban.Plugins.Lifeline — Oban v2.19.4 isn’t on by default. That’s how it works.

True, you will have to get in there and do it manually. So, you can but it is manual or via the dashboard. Although, I don’t understand why that is an issue. Would you please clarify why doing so manually is an issue? It will help us going forward.

1 Like

I think you misunderstood my post. I was not responding to the question about “jobs left in a moot state”. I was referring to my workaround for the lack of the hooks you mentioned, in OSS version. Once the job is discarded (for a valid reason), I can’t do anything more with it from the dashboard. I need to update job’s fields directly on DB level. Yes, I could prepare some script(s) to avoid catastrophic typos or so. But still - that’s a production DB. Feels like wandering around production server, trying things in a #-propmted shell :wink:

You can and should try discarded jobs through the dashboard. You can also do it with Oban.retry_job/1, from the console, without touching the prod DB.

Which version of Web are you using? Do you see the retry option in the jobs header or details page?

Thanks for taking the time to clarify and help me understand.. It helps refine the flow and experience for others.

Of course, and with pleasure. Please check the very first post in this thread. This should explain. There is nothing wrong with my Oban Web interface. The “retry” options are there, etc. The problem lies with how my workaround to the original issue works, and I was explaining one of the reasons why I am not overly happy with it. Simply any attempt beyond the iniial “real” ones returns {:error, :max_attempts_reached}.

Now, having said that, I think I might be able to somewhat improve it by guarding differently. Like against attempt being exactly the expected max_attempts + 1. Need to get back to this.

1 Like

That does help!

Then that’s a byproduct of the way you chose to work around the problem. There are alternatives available, paid or otherwise, as suggested in this thread :woman_shrugging: You’re right.

Retrying through the Web interface, or with Oban.retry_job will already increment the max_attempts. The database constraints actually make it impossible to have attempt greater than max_attempts, so that’s a requirement.

Thanks for taking the time to help us understand!