Hey, there!
We’re using Oban and Oban pro. We have a main worker that pulls data from 3rd parties daily.
Here’s how we define it:
defmodule Integrations.Workers.Workflow do
use Oban.Pro.Worker,
queue: :integrations,
max_attempts: 5,
unique: [
period: :infinity,
keys: [:integration_id, :service],
states: [:available, :scheduled, :executing, :retryable]
]
This worker then triggers a Batch which uses Oban.Pro.Workers.Batch.
The idea is that we always generate a new Integration.Workflow and schedule it 24hours later. There are 3 potential outcomes:
- we have errors: in the ErrorHandler we schedule a Integrations.Workers.Workflow after 24 hours.
Here’s how we attach the telemetry:
:telemetry.attach(
"oban-job-error",
[:oban, :job, :exception],
&Integrations.Telemetry.ErrorHandler.handle_event_wrapper/4,
[]
)
- we have a discard that requires a reschedule: in the StopHandler we schedule a Integrations.Workers.Workflow after 24 hours.
Here’s how we attach the telemetry:
:telemetry.attach(
"oban-job-stop",
[:oban, :job, :stop],
&Integrations.Telemetry.StopHandler.handle_event_wrapper/4,
nil
)
- all is well: In the custom callback worker for the Batch we schedule a new Integrations.Workers.Workflow for 24 hours later as well
All these scheduling is done like this:
args
|> Integrations.Workers.Workflow.new(
queue: args["queue"],
schedule_in: args["schedule_update_in"]
)
|> Oban.insert!()
Recently we updated Oban from 2.16.2 to 2.17.3 and Oban Pro from 1.1.4 to 1.3.0 because we wanted to use the DynamicCron plugin and use the scheduling guarantees.
This caused issues because we saw that now the ErrorHandler
and StopHandler
stopped being able to reschedule Integrations.Workers.Workflow
workers (we have a log after the insert and it appears but the worker does not get rescheduled).
By skimming through the changelogs we noticed the ack_async had been added between said versions and by making it false
the ErrorHandler
and the StopHandler
can now schedule the Workflows again.
We have tried in sandbox to upgrade the libs: oban from 2.17.3 to 2.17.10 and oban_pro from 1.3.0 to 1.4.9 but that has the same issue. (it also raised the db to 100% in prod but not sure if it’s related to using ack_async: false
or not)
Has anyone faced a similar problem? If not, do you have tips on how we can change our flow in a way that prevents these issues?
Thanks!