Oban Queues - Concurrent Jobs

en86 · June 21, 2022, 12:48pm

I’m using Oban to split an email sent to thousands of users into multiple jobs to avoid isues with spam. So an email to 5000 users will get split up into 10 jobs, spaced a few seconds apart, each emailing 500 users.

I’m not clear, though, on what exactly the the queue settings for concurrent jobs are actually doing. For example, imagine this config

queues: [mailers: 1]

What’s happenning here? I had two theories on what I thought it might be doing:

Theory A
If a job takes 3 seconds to run but jobs are spaced out by 1 second, with only 1 concurrency, the jobs would quickly get behind schedule.

But, if I’m spacing out emails by 10 seconds, and executing email code only takes about 1 or 2 seconds, I actually don’t need any more than a setting of 1, since 1 process would be able to handle this load.

Theory B
With a setting of 1, if a job fails, Oban won’t even attempt the next job until it eventually succeeds on that first job, so a 1 setting could potentially create a huge bottleneck.

Do either of these sound like an accurate description of what the Oban queue concurrent setting is doing?

Thank you in advance for any help!

ruslandoga · June 21, 2022, 2:05pm

queues: [mailers: 1] means there is one underlying process to process the queue, which means there is at most one job being executed at a time on a node. It’s unrelated to when and how many jobs are scheduled at a point in time. The worker would take a job from the queue, run it, and then if there are other jobs that are due, run them in order afterwards. That means that some jobs might be executed later than they are scheduled.

I should’ve read the question in full before attempting to answer.

Here’s a script that might be used to verify your theories.

queue.ex

# adapting https://github.com/wojtekmach/mix_install_examples/blob/main/oban.exs
# to make the job unsuccessful
Mix.install([
  {:ecto_sql, "~> 3.6.2"},
  {:postgrex, "~> 0.15.0"},
  {:oban, "~> 2.8"}
])

Application.put_env(:myapp, Repo, database: "mix_install_oban")

defmodule Repo do
  use Ecto.Repo,
    adapter: Ecto.Adapters.Postgres,
    otp_app: :myapp
end

defmodule Migration0 do
  use Ecto.Migration

  def change do
    Oban.Migrations.up()
  end
end

defmodule Main do
  def main do
    children = [
      Repo,
      {Oban, repo: Repo, plugins: [Oban.Plugins.Pruner], queues: [default: 1]} # 1 process queue
    ]

    Repo.__adapter__().storage_down(Repo.config())
    Repo.__adapter__().storage_up(Repo.config())
    {:ok, _} = Supervisor.start_link(children, strategy: :one_for_one)

    Ecto.Migrator.run(Repo, [{0, Migration0}], :up, all: true)

    Oban.insert!(Worker.new(%{id: 1}))
    Oban.insert!(Worker.new(%{id: 2}))

    Oban.Job
    |> Repo.all()
  end
end

defmodule Worker do
  use Oban.Worker
  require Logger

  @impl true
  def perform(%Oban.Job{} = job) do
    Logger.info("running job #{job.id}")
    1 / 0
  end
end

Main.main()
:timer.sleep(:infinity)

> elixir queue.ex
17:15:00.848 [info]  running job 1
17:15:00.862 [info]  running job 2
17:15:17.845 [info]  running job 1
17:15:18.847 [info]  running job 2
17:15:36.929 [info]  running job 1
17:15:37.935 [info]  running job 2
17:16:01.038 [info]  running job 1
17:16:02.042 [info]  running job 2
17:16:30.146 [info]  running job 1
17:16:33.157 [info]  running job 2

So Theory B doesn’t seem to hold as oban doesn’t get stuck on a single job.

sorentwo · June 21, 2022, 2:32pm

The number in mailers: 1 is a concurrency limit. It regulates how many jobs may execute at once (concurrently) within that queue for that node.

Neither theory A nor B is quite right. A few notes to clarify:

Scheduling is purely a timestamp. Once the timestamp is current the job is made available to run.
Job execution is interdependent, meaning the failure of one job doesn’t block another job from running. In your scenario, if the first job fails then it is scheduled for a retry and the next available job may run.
Concurrency limits are local to the node. If you run two nodes, or even do blue/green deploys, then you’ll have an effective concurrency of 2 instead of 1.

en86 · June 21, 2022, 6:19pm

Thanks, that’s very helpful!