Oban: error handling and job retry

I’m implementing a controller that schedules a background job and returns.
The job itself is implemented as a GenServer Foo.

I wanted this to be reliable, so that if Foo fails, it is restarted, different Foo jobs are distributed, and that I have a track record of what went wrong and right.
Coming from ruby I wanted similar guarantees to Sidekiq, and so I decided I’ll use Oban workers.

I created an Oban worker that start_links to the GenServer Foo and waits using receive for a message from Foo, that it has completed its job.

I thought that if Foo dies, the worker will die as well and Oban will restart it. Wrong: Oban worker will only be restarted if it raises and error, not if it gets killed.

So instead, i did Process.monitor on Foo’s pid, and I’m checking if the message from Foo is the completion message, or : DOWN message. If it’s the latter, I’m raising an exception so Oban retries this job later.

This works but I feel like suddenly I’m implementing some kind of supervisor mechanism and I should probably use mechanisms that are already present in Elixir. What would you suggest here? Also the necessity to pass “call me back” pid to Foo so it sends back information about successful completion seems like unnecessary coupling. Maybe this is not the case for using Oban?

For those interested in details: the job is to download a huge file and process it. The controller endpoint just accepts the URL of huge file and schedules this job.

1 Like

Is this the only place you’re using the Foo gen server? If so, perhaps you don’t need to use a separate GenServer or process at all. Each job in Oban runs as an isolated process already.

This can work, you’d just need to call Process.flag(:trap_exit, true) within the Oban job to ensure that it receives an :EXIT, pid, reason message rather than crashing the process. By default Oban jobs do not trap exits.

Sending messages between processes will always add a little bit of complexity: passing pids, sending messages and monitoring are par for the course. If possible, I’d first try to put all of this within a single process. Failing that, I would try to use a Task rather than a GenServer and simplify the message passing.