Unusual Oban behaviour - 15 second delays

Hi all,

I am using Oban for the first time and I am experiencing a weird behaviour.

I have this config

config :services, MyApp.Repo,
  url: "postgres://postgres:postgres@postgres:5432/myapp",
  stacktrace: true,
  show_sensitive_data_on_connection_error: true,
  pool_size: 10

config :services, Oban,
  repo: MyApp.Repo,
  queues: [
    default: 1,
  ],
  plugins: [
    Oban.Plugins.Pruner
  ]

When I launch this job

defmodule Services.Jobs.ObanTest do
  @moduledoc false

  use Oban.Worker, queue: :default

  require Logger

  def perform(%Job{args: %{"id" => id}}) do
    Logger.info("#{id}: processing ...")
    Process.sleep(100)
    :ok
  end

  def perform(%Job{}) do
    Logger.info("Enqueueing jobs ...")
    jobs = for i <- 1..100, do: new(%{"id" => i})
    Oban.insert_all(jobs)
    :ok
  end
end

Oban processes all the jobs but in weird timing:

09:11:31.156 pid=<0.1709.0> [info] Enqueueing jobs ...
09:11:31.255 pid=<0.1710.0> [info] 7: processing ...
09:11:31.364 pid=<0.1711.0> [info] 18: processing ...
09:11:31.472 pid=<0.1712.0> [info] 32: processing ...
09:11:31.581 pid=<0.1713.0> [info] 46: processing ...
09:11:31.689 pid=<0.1714.0> [info] 60: processing ...
09:11:31.798 pid=<0.1715.0> [info] 74: processing ...
09:11:31.907 pid=<0.1716.0> [info] 89: processing ...
09:11:47.313 pid=<0.1717.0> [info] 1: processing ...
09:11:48.292 pid=<0.1718.0> [info] 15: processing ...
09:11:48.402 pid=<0.1719.0> [info] 88: processing ...
09:11:49.298 pid=<0.1720.0> [info] 13: processing ...
09:11:49.406 pid=<0.1721.0> [info] 38: processing ...
09:11:49.514 pid=<0.1722.0> [info] 67: processing ...
09:11:49.622 pid=<0.1723.0> [info] 85: processing ...
09:11:50.318 pid=<0.1724.0> [info] 25: processing ...
09:11:50.428 pid=<0.1725.0> [info] 68: processing ...
09:12:06.445 pid=<0.1726.0> [info] 19: processing ...

Why does it sometimes stop for +/- 15 seconds ?
Is it due to my Repo configuration ?

Edit: and sometimes I do have the feeling that not everything is enqueued:

iex(11)> 09:49:05.143 pid=<0.782.0> [info] 21: processing ...
09:49:05.143 pid=<0.783.0> [info] 22: processing ...
09:49:05.143 pid=<0.784.0> [info] 23: processing ...
09:49:05.144 pid=<0.786.0> [info] 25: processing ...
09:49:05.144 pid=<0.785.0> [info] 24: processing ...
09:49:05.144 pid=<0.787.0> [info] 26: processing ...
09:49:05.144 pid=<0.788.0> [info] 27: processing ...
09:49:05.144 pid=<0.789.0> [info] 28: processing ...
09:49:05.144 pid=<0.790.0> [info] 29: processing ...
09:49:05.144 pid=<0.791.0> [info] 30: processing ...
09:49:05.144 pid=<0.792.0> [info] 31: processing ...
09:49:05.144 pid=<0.793.0> [info] 32: processing ...
09:49:05.144 pid=<0.794.0> [info] 33: processing ...
09:49:05.144 pid=<0.795.0> [info] 34: processing ...
09:49:05.144 pid=<0.796.0> [info] 35: processing ...
09:49:05.144 pid=<0.797.0> [info] 36: processing ...
09:49:05.144 pid=<0.798.0> [info] 37: processing ...
09:49:05.144 pid=<0.799.0> [info] 38: processing ...
09:49:05.144 pid=<0.800.0> [info] 39: processing ...
09:49:05.145 pid=<0.801.0> [info] 40: processing ...
 
nil
iex(12)> Oban.cancel_all_jobs(Oban.Job)                  
{:ok, 0}

Am I missing something ?

Regards.

1 Like

A 15 second delay seems like retry backoff. If you check the attempt field for those jobs I expect you’ll find a job with more than one attempt.

I recommend using Oban.Telemetry.attach_default_logger to get runtime logging. That will show timing information and any errors jobs encounter.

6 Likes

It definitely seems to skip some jobs:

18:20:40.856 [debug] {"args":{},"attempt":1,"event":"job:start","id":506,"max_attempts":20,"meta":{},"queue":"default","source":"oban","system_time":1667499640855898355,"tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:40.856 [info] Enqueueing jobs ...
18:20:40.873 [debug] {"args":{},"attempt":1,"duration":14996,"event":"job:stop","id":506,"max_attempts":20,"meta":{},"queue":"default","queue_time":14217,"source":"oban","state":"success","tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:40.880 [debug] {"args":{"id":1},"attempt":1,"event":"job:start","id":507,"max_attempts":20,"meta":{},"queue":"default","source":"oban","system_time":1667499640880534022,"tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:40.880 [info] 1: processing ...
18:20:40.985 [debug] {"args":{"id":1},"attempt":1,"duration":102519,"event":"job:stop","id":507,"max_attempts":20,"meta":{},"queue":"default","queue_time":15624,"source":"oban","state":"success","tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:40.993 [debug] {"args":{"id":3},"attempt":1,"event":"job:start","id":509,"max_attempts":20,"meta":{},"queue":"default","source":"oban","system_time":1667499640992885772,"tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:40.993 [info] 3: processing ...
18:20:41.095 [debug] {"args":{"id":3},"attempt":1,"duration":100717,"event":"job:stop","id":509,"max_attempts":20,"meta":{},"queue":"default","queue_time":130427,"source":"oban","state":"success","tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:41.102 [debug] {"args":{"id":5},"attempt":1,"event":"job:start","id":511,"max_attempts":20,"meta":{},"queue":"default","source":"oban","system_time":1667499641102158605,"tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:41.102 [info] 5: processing ...
18:20:41.205 [debug] {"args":{"id":5},"attempt":1,"duration":101299,"event":"job:stop","id":511,"max_attempts":20,"meta":{},"queue":"default","queue_time":239478,"source":"oban","state":"success","tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:41.212 [debug] {"args":{"id":8},"attempt":1,"event":"job:start","id":514,"max_attempts":20,"meta":{},"queue":"default","source":"oban","system_time":1667499641212085189,"tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:41.212 [info] 8: processing ...
18:20:41.315 [debug] {"args":{"id":8},"attempt":1,"duration":102909,"event":"job:stop","id":514,"max_attempts":20,"meta":{},"queue":"default","queue_time":349732,"source":"oban","state":"success","tags":[],"worker":"Services.Jobs.ObanTester"}
18:20:41.323 [debug] {"args":{"id":10},"attempt":1,"event":"job:start","id":516,"max_attempts":20,"meta":{},"queue":"default","source":"oban","system_time":1667499641323282605,"tags":[],"worker":"Services.Jobs.ObanTester"}

I’ve tried the exact same code in a greenfield project, to test, and it works perfectly.
So it should be something with my configuration.

I’ll investigate further and will keep you informed.

1 Like

Just found the culprit: the only difference between my project and the greenfield one used to test is … Docker !
I’ve run my project outside Docker and … it works perfectly !
I’ll have a look at why it has this weird behaviour.

Anyway, thanks a lot for your help and keep up the excellent work on Oban !

6 Likes

Maybe the container is given too few CPU cores?

1 Like

I’ve given Docker six cores and it’s a simple webapp, so it should be enough.

An update on my investigations.
I had the same behaviours on a Intel Macbook Pro running Fedora Linux, that is not working inside of a Docker container but running perfectly well outside a container.
A friend suggested me to run the container with the --privileged option and … it works well ! It seems the Docker engine limits some resources so I will investigate further to see which resources are limited and why.

I’ll keep this post updated as I think it can be of public interest.

4 Likes

I’ve updated Docker Desktop to the latest version. I don’t seem to have this issue anymore.
So I guess it was some kind of weird bug in my older Docker Desktop version.

1 Like