Oban — Reliable and Observable Job Processing

Oban v0.9.0 has been released with a few important bug fixes and a couple of minor feature additions. Definitely upgrade if you’ve ran into trouble in CI with multiple unscoped Oban.Migration calls.

Thanks to everybody that has been using the library and reporting issues! :yellow_heart:

From the CHANGELOG

Added

  • [Oban] Add insert_all/2 and insert_all/4, corresponding to Ecto.Repo.insert_all/3 and Ecto.Multi.insert_all/5, respectively. @halostatue

  • [Oban.Job] Add to_map/1 for converting a changeset into a map suitable for database insertion. This is used by Oban.insert_all/2,4 internally and is exposed for convenience.

Changed

  • [Oban.Config] Remove the default queue value of [default: 10], which was overriden by Oban.start_link/1 anyhow.

  • [Oban.Telemetry] Allow the log level to be customized when attaching the default logger. The default level is :info, the same as it was before.

Fixed

  • [Oban.Migrations] Prevent invalid up and down targets when attempting to run migrations that have already been ran. This was primarily an issue in CI, where the initial migration was unscoped and would migrate to the current version while a subsequent migration would attempt to migrate to a lower version. @jc00ke

  • [Oban.Job] Prevent a queue comparison with nil by retaining the default queue (default) when building uniqueness checks.

  • [Oban.Job] Set state to scheduled for jobs created with a scheduled_at timestamp. Previously the state was only set when schedule_in was used.

7 Likes

:heart: Thanks for the quick fix + release of the migration issue! :heart:

1 Like

Oban v0.10.0 is published with a huge performance improvement, a change to the default logging behavior and a huge restructuring of notifications. The result is much faster (258,000 faster with extremely large queues, millions of unprocessed jobs) and more fault tolerant (able to run without a database connection and recover safely when the database comes back up).

This release does involve a migration and I highly recommend it.

From the CHANGELOG

Migration Optional (V5)

Tables with a lot of available jobs (hundreds of thousands to several million) are prone to time outs when fetching new jobs. The planner fails to optimize using the index available on queue, state and scheduled_at, forcing both a slow sort pass and an expensive bitmap heap scan.

This migration drops the separate indexes in favor of a a single composite index. The resulting query is up to 258,757x faster on large tables while still usable for all of the other maintenance queries.

History of the EXPLAIN ANALYZE output as the query was optimized is available here: https://explain.depesz.com/s/9Vh7

Changed

  • [Oban.Config] Change the default for verbose from true to false. Also, :verbose now accepts only false and standard logger levels. This change aims to prevent crashes due to conflicting levels when the repo’s log level is set to false.

Fixed

  • [Oban.Notifier] Restructure the notifier in order to to isolate producers from connection failures. Errors or loss of connectivity in the notification connection no longer kills the notifier and has no effect on the producers. @axelson
10 Likes

Awesome! Huge thanks for such outstanding package and the continuing effort to improve it!

6 Likes

I took a look through the readme file on github and didn’t see this but is it possible to have scheduled tasks that get repeated on a set interval? For example, every Monday at 2pm run X job (or any cron-like interval variant)?

1 Like

I’ve used a combination of Quantum and Oban for this. Quantum takes cron config, and if the action is super fast, I just do it. If it may fail or can take a while, I have Quantum simply enqueue a job.

3 Likes

Thanks. I am using Quantum too. I guess repeated scheduling is better off being kept as a third party dependency? Oban is so well put together that not supporting repeated scheduling seems like it’s probably by design?

1 Like

It is kind of by design. I have a separate package called “oban_cron” that I haven’t published yet, but it does exactly what you’re asking about. I’ve debated making it part of oban itself, but it works nicely as a separate package.

Quantum is really powerful, but it doesn’t make sense in every situation IMO. There are a couple of downsides that I hoped to address with a focused package:

  1. Work without connected nodes. Quantum uses node communication for leadership, to prevent double-enqueuing a job.
  2. Have a smaller footprint. Quantum is a lot of code when all you want to specify is {"0 14 * * MON", MyApp.SomeWorker} in some configuration.
6 Likes

I remember in the Rails days using Sidekiq but then also lugging around Clockwork just for scheduling. When glancing your readme one of the first things I looked for was repeated scheduled tasks.

I wonder what others think about putting oban_cron into the main repo.

6 Likes

Informally I’ll consider “hearts” on this post to be in favor of including it directly in Oban. If anybody has arguments against inclusion please post!

14 Likes

reads post, hearts it as confirmation

Lol, but really, it’s not hard to not use it and has no overhead if not used if it’s not needed, and is very specific to oban anyway, so…

2 Likes

I think that it would be a great addition to Oban.

The current version of Quantum is unstable when clustering in certain cases. We’ve experienced some of them in the company I work for which caused some undesirable effects such as tasks not running or running once per each node in the cluster.
This is acknowledged in Quantum and they are working in a solution.

Oban has been rock solid and reliable since we started using it in the project. So it could be a wonderful alternative.

4 Likes

@nickjanetakis @belaustegui I merged Periodic (CRON) support into Oban today. Please check it out, it will be included in v0.11 sometime next week.

Thanks for the encouragement!

12 Likes

Nice work. Looking forward to replacing Quantum with this. After just glancing at your commit, that makes me really happy to see. I had nothing against Quantum but being able to drop a whole library with a ton of code for a few dozen lines of code is a huge win.

One question. In your new docs you put “Jobs are considered unique for most of each minute”. What type of race conditions or edge cases should we be aware of to hit the points where it might not be unique?

1 Like

Jobs are marked as unique for 59 seconds, not 60 seconds. There has to be some wiggle room between the unique period and the next enqueue cycle. That leaves a one second window where a theoretical double-enqueue is possible. That situation would only happen if you can restart your node fast enough that it enqueued at the start of one second, booted up and enqueued at the end of the same second.

If I’m understanding correctly, this supports running a job on a single node in a cluster periodically.

It might be beyond the scope of Oban, but is there a way to run a job on every node in a cluster periodically? I’ve used Quantum that way for things like refreshing credentials periodically.

1 Like

Right. That’s the goal.

I hadn’t considered that use case. It would certainly be possible: eliminate the transaction lock and remove the unique period while scheduling.

You could kind of hack it with the current implementation by overriding the unique period within your job so that it’s very short:

use Oban.Job, queue: "scheduled", unique: [period: 1]

If the nodes start at different times that would work, but it wouldn’t be very reliable.

I’ll think about this use case a bit.

3 Likes

Oban v0.11.0 is published with a variety of bug fixes and the addition of CRON jobs. There is an optional migration that will prevent issues recording beats when job ids get into the 64bit range.

Anybody using the UI beta should upgrade, the oban_update notification change is essential to keeping stats updated.

From the CHANGELOG

Migration Optional (V6)

Job id’s greater than 2,147,483,647 (PG int limit) can’t be inserted into the running array on oban_beats. The array that Ecto defines uses int instead of bigint, which can’t store the larger integers. This migration changes the column type to bigint[], a locking operation that may take a few seconds.

Added

  • [Oban] Added crontab support for automatically enqueuing jobs on a fixed schedule. A combination of transactional locks and unique jobs prevents scheduling duplicate jobs.

Fixed

  • [Oban.Migrations] Add a comment when migrating oban_jobs to V5 and when rolling back down to V4.

  • [Oban.Query] Apply the configured log level to unique queries.

  • [Oban.Notifier] Prevent open connections from accumulating when the circuit is tripped during the connection phase. This change may leave notifications in a state where they aren’t listening to all channels.

Changed

  • [Oban.Notifier] Replay oban_update notifications to subscribed processes.
10 Likes

Oban v0.12.0 is out with some fun features, testing improvements, bug fixes and a helpful (optional) migration for large pruning operations. Thanks to all of the contributors who made this one possible!

From the CHANGELOG

Migration Optional (V7)

The queries used to prune by limit and age are written to utilize a single partial index for a huge performance boost on large tables. The new V7 migration will create the index for you—but that may not be ideal for tables with millions of completed or discarded jobs because it can’t be done concurrently.

If you have an extremely large jobs table you can add the index concurrently in a dedicated migration:

create index(
         :oban_jobs,
         ["attempted_at desc", :id],
         where: "state in ('completed', 'discarded')",
         name: :oban_jobs_attempted_at_id_index,
         concurrently: true
       )

Added

  • [Oban] Add start_queue/3 and stop_queue/2 for dynamically starting and stopping supervised queues across nodes.

  • [Oban] Add drain_queue/3 to accept drain options. with_scheduled: true allows draining scheduled jobs.

  • [Oban] Expose circuit_backoff as a “twiddly” option that controls how long tripped circuit breakers wait until re-opening.

  • [Oban.Testing] Accept a value/delta tuple for testing timestamp fields. This allows more robust testing of timestamps such as scheduled_at.

  • [Oban.Telemetry] Emit [:oban, :trip_circuit] and [:oban, :open_circuit] events for circuit breaker activity. Previously an error was logged when the circuit was tripped, but there wasn’t any way to monitor circuit breakers.

    Circuit breaker activity is logged by the default telemetry logger (both :trip_circuit and :open_circuit events).

Fixed

  • [Oban.Query] Avoid using prepared statements for all unique queries. This forces Postgres to use a custom plan (which utilizes the compound index) rather than falling back to a generic plan.

  • [Oban.Job] Include all permitted fields when converting a Job to a map, preserving any optional values that were either specified by the user or came via Worker defaults.

  • [Oban.Migrations] Guard against missing migration modules in federated environments.

Changed

  • [Oban] Allow the multi name provided to Oban.insert/3,4 to be any term, not just an atom.

  • [Oban.Query] Use a consistent and more performant set of queries for pruning. Both pruning methods are optimized to utilize a single partial index.

8 Likes

For people interested in Oban, Parker Selbert (@sorentwo) was recently on the ElixirMix podcast talking about it.

https://devchat.tv/elixir-mix/emx-079-oban-with-parker-selbert/

9 Likes