tl;dr Announcing Oban, an Ecto based job processing library with a focus on reliability and historical observability.
After spending nearly a year building Kiq, an Elixir port of Sidekiq with most of the bells and whistles, I came to the realization that the model was all wrong. Most of us don’t want to rely on Redis for production data, and Sidekiq is a largely proprietary legacy system. Not the best base for a reliable job processing system.
So, I took the best parts of Kiq and some inspiration from EctoJob and put together Oban. The primary goals are reliability , consistency and observability. It is fundamentally different from other background job processing tools because it retains job data for historic metrics and inspection.
Here are some of the marquee features that differentiate it from other job processors that are out there (pulled straight from the README):
- Isolated Queues — Jobs are stored in a single table but are executed in distinct queues. Each queue runs in isolation, ensuring that a jobs in a single slow queue can’t back up other faster queues.
- Queue Control — Queues can be paused, resumed and scaled independently at runtime.
Job Killing — Jobs can be killed in the middle of execution regardless of which node they are running on. This stops the job at once and flags it as
- Triggered execution — Database triggers ensure that jobs are dispatched as soon as they are inserted into the database.
- Scheduled Jobs — Jobs can be scheduled at any time in the future, down to the second.
- Job Safety — When a process crashes or the BEAM is terminated executing jobs aren’t lost—they are quickly recovered by other running nodes or immediately when the node is restarted.
- Historic Metrics — After a job is processed the row is not deleted. Instead, the job is retained in the database to provide metrics. This allows users to inspect historic jobs and to see aggregate data at the job, queue or argument level.
- Node Metrics — Every queue broadcasts metrics during runtime. These are used to monitor queue health across nodes.
- Queue Draining — Queue shutdown is delayed so that slow jobs can finish executing before shutdown.
- Telemetry Integration — Job life-cycle events are emitted via Telemetry integration. This enables simple logging, error reporting and health checkups without plug-ins.
One more thing! A stand-alone dashboard built on Phoenix Live View is in the works.
The killer feature for any job processor is the UI. Every sizable app I know of relies on a web UI to introspect and manage jobs. It is very much a WIP, but here is a preview of the UI running in an environment with constant job generation: