Process gets killed when executing as Oban job, but works just fine otherwise

I have a bit of a mystery on my hand. I have Oban job that backfills data, in batches. It was supposed to run for hours, but it gets killed with ** (EXIT from #PID<0.39361611.0>) killed after around 5 minutes. Sometimes it’s 4.5, sometimes it’s 6.5 minutes, but that’s the range.

Now, I do have timeout set to :infinity on this job. I also have other jobs, that sometimes take 10, 20 or 40 minutes and do not get killed.

The job goes through a bunch of data, quite a lot actually, and in a recursive manner but it’s not a leak as in - the functions are properly tail call optimized.

If I just start the job with spawn fn → MyWorker.perform(:ignore) end it works as expected. Memory does not leak, it’s stable, the script runs for hours with no issues.

But if I start it from Oban, it’s 5-6 minutes and it gets killed.

Anyone has ideas what this can be?

1 Like

This is an odd one. There are a couple of differences between how a job is executed and wrapping it in spawn. Perhaps trying to run incrementally closer will yield something?

  1. Run with Task.start rather than spawn
  2. Run from a dynamic supervisor (TaskSupervisor.start_child)
  3. Run with Oban.Queue.Executor (essentially the same as using Testing.perform_job/2)
1 Like