Process gets killed when executing as Oban job, but works just fine otherwise

hubertlepicki · November 6, 2025, 4:36pm

I have a bit of a mystery on my hand. I have Oban job that backfills data, in batches. It was supposed to run for hours, but it gets killed with ** (EXIT from #PID<0.39361611.0>) killed after around 5 minutes. Sometimes it’s 4.5, sometimes it’s 6.5 minutes, but that’s the range.

Now, I do have timeout set to :infinity on this job. I also have other jobs, that sometimes take 10, 20 or 40 minutes and do not get killed.

The job goes through a bunch of data, quite a lot actually, and in a recursive manner but it’s not a leak as in - the functions are properly tail call optimized.

If I just start the job with spawn fn → MyWorker.perform(:ignore) end it works as expected. Memory does not leak, it’s stable, the script runs for hours with no issues.

But if I start it from Oban, it’s 5-6 minutes and it gets killed.

Anyone has ideas what this can be?

sorentwo · November 7, 2025, 7:25am

This is an odd one. There are a couple of differences between how a job is executed and wrapping it in spawn. Perhaps trying to run incrementally closer will yield something?

Run with Task.start rather than spawn
Run from a dynamic supervisor (TaskSupervisor.start_child)
Run with Oban.Queue.Executor (essentially the same as using Testing.perform_job/2)