Long running Oban cron jobs

hubertlepicki · November 21, 2022, 3:07pm

I have a system that needs to check for work every 5 minutes, and then process the payload. Processing the payload can take significantly more than 5 minutes.

Currently, Oban would just schedule extra jobs, that will be executed later. I have set the concurrency on the queue to 1, meaning no two jobs of the type execute at the same time. But the system is still doing a lot of unnecessary work after each long running job completes.

Is there a setting / pattern I can use to avoid scheduling additional jobs, while the Cron job is running? //cc @sorentwo

dimitarvp · November 21, 2022, 3:21pm

What do you mean by that? The rescheduling for later if the job hasn’t finished yet?

I don’t know your project but my instinctive reaction would be to use the OTP directly here e.g. have a single unique worker in the supervision tree with a loop inside that does its job then does Process.send_after on itself (5 minutes into the future), then repeat.

hubertlepicki · November 21, 2022, 3:27pm

I can do it… but then I need to make sure the worker is unique in the cluster and the cron thing does it for me. I mean, I know how to do it with more code, either have a lock in the database, or have a global name registered in the cluster but… current solution is good enough, just needs a little tweak, if available.

dimitarvp · November 21, 2022, 3:51pm

Ah, distribution is involved. Then you’re likely correct that getting it right might be more involved than it’s perceived viable right now.

sorentwo · November 21, 2022, 4:08pm

@hubertlepicki You can manage long-running cron jobs without overlap with a small tweak to the job’s unique settings. Here’s a complete example of a cron that is scheduled to run every minute but purposefully takes 65s to complete:

defmodule SlowCron do
  use Oban.Worker,
    max_attempts: 1,
    unique: [period: 300, states: ~w(available executing)a]

  def perform(job) do
    IO.puts("Started #{job.id}, inserted at #{job.inserted_at}")

    Process.sleep(:timer.seconds(65))

    IO.puts("Finished #{job.id}")

    :ok
  end
end

Oban.Test.Repo.start_link()

Oban.start_link(
  repo: Oban.Test.Repo,
  queues: [default: 10],
  plugins: [{Oban.Plugins.Cron, crontab: [{"* * * * *", SlowCron}]}]
)

The console output shows that the job ids are sequential but skip over the overlapping minute:

Started 16144, inserted at 2022-11-21 16:01:00.409060Z
Finished 16144
Started 16145, inserted at 2022-11-21 16:03:00.429351Z
Finished 16145
Started 16146, inserted at 2022-11-21 16:05:00.447115Z

The important part is extending the unique period while overriding the unique states so they don’t include completed jobs. This is safe because cron will only run on a single node in your cluster (the leader, according to Oban.Peer.leader?/0) regardless of the unique settings.

hubertlepicki · November 22, 2022, 12:14pm

@sorentwo that works thank you so much!

@dimitarvp we’re paying @sorentwo precisely not to have deal with all that xD.

dimitarvp · November 22, 2022, 12:35pm

lol

Well I don’t have the context but I immediately bookmarked his reply because it’s super helpful and I am sure I’ll need it one day…

hubertlepicki · November 23, 2022, 7:13am

that was 50% why I asked it. It’s not obvious from the documentation. I could have experimented with unique: options but that’d be wasting some precious time I could spend playing with my kiddo. So I asked on Elixirforum and tagged @sorentwo . Hope that’s fine. Now it’s documented for everyone who can google.

trisolaran · November 23, 2022, 12:35pm

That reminds me that I should also spend less time here and more time playing with my boy