PragTob
Background job queues: When to use? When not to use? Which one to use?
Hello everyone,
I know we had quite some threads (read through lots of them) about background job processing but it remains a hotly debated topic and something especially people migrating over from other languages (especially the ones with a GIL like Ruby or Python) have questions about. A friendly company wanting to try out elixir just asked me whether or not they need a background job processing system and if yes which one, I couldn’t give them a great answer.
I thought I’d summarize my understanding, I’m not an expert by any means, and kick off some discussion. This is where you come in. I’d love all your input, especially if my understanding is correct and on what libraries/setups you might have used and recommend.
When to reach for a background job tool
“You don’t need background job processing in Elixir/Erlang” This is a sentiment I read a lot especially in the early days of this wonderful forum. I think it’s somewhat of a misunderstanding. No I don’t need background job processing systems just to achieve parallelism.
What can I easily do in parallel?
- I want to do n things in parallel and aggregate results, example: I want to get data from n different data sources (like recommendation engines) and then aggregate them -
Task.async+Task.await=
- I just want something to happen but don’t care when it finishes - like for instance image processing, some updated caching data…
In summary, probably tasks that likely don’t fail/can be retried immediately and can be done right now, and you can afford to not turn the system off while they are executing. I know hot code upgrades are a thing, but from my understanding if you don’t absolutely need them they’re discouraged for complexity reasons.
What probably needs some more advanced system?
(please correct me if any of these are easily done in just elixir/erlang)
- I want the system to be robust to system restarts/system crashes (which shouldn’t happen right
) - because if these happen then you lose the job that was executing and not done (you can stop the server from shutting down afaik though, which helps for restarts) - I want to have exponential back off retries - this means that a retry might happen 2 hours in the future, which wouldn’t be feasible to delay the application restarting for so long
- executing jobs in the future at all - a restart would clear these as well so you’d lose them

As a concrete example for what I think I need a background job queue:
I want to notify a partner system of something:
- this needs to be delivered at least once
- partner systems are down rather frequently sometimes for hours, so I want to retry ~5 times with exponential back off but also have the possibility to retry manually after that
I like how the exq README puts it:
If you need a durable jobs, retries with exponential backoffs, dynamically scheduled jobs in the future - that are all able to survive application restarts, then an externally backed queueing library such as Exq could be a good fit.
Existing queue systems
- rihanna - PostgreSQL storage, uses advisory locks
- exq - Redis backed, compatible with Sidekiq format - I like the “do you need exq?” section
- verk - Redis based as well, also supports sidekiq format
- que - backed by Mnesia which is a database builtin to erlang/otp so no extra infrastructure
- toniq - uses redis, hasn’t seen an update in over a year
- honeydew - pluggable storage, featuring in memory, Mnesia and ecto queues.
- ecto_job - backed by PostgreSQL, focussed on transactional behaviour, uses
pg_notifyso doesn’t do any database polling afaik (might be true for others here I just know this) - kiq - a rather new library, also redis backed and aiming at sidekiq compatibility, it was under heavy development around the jump of the year
- faktory_worker_ex - a worker for Mike Perham’s new more server based system faktory - woud especially be interested in opinions/experiences here.
- gen_queue - a generic interface to different queue systems mentioned above and others for flexibility
What I find interesting is that our forum discussions are often very focussed on how we can do it just in the BEAM, which I quite like - but we have comparatively little libraries that implement it BEAM/OTP only. Part because people have problems with mnesia. Something that I found in in the discussions but apparently no library to go along with it is using dets for storage.
other things
I find this exchange between @benwilson512 and @sasajuric very interesting
Also in the discussion of course gen_stage comes up, for processing large amounts of data.
Discussion Points
- Are there more things that we should do only in the BEAM/OTP?
- What are other scenarios where we should reach for a background job processing system?
- What library or setup can you recommend?
Most Liked
lpil
I’m a Rihanna user and recently became a maintainer. For my use Postgres for persistence is the real winner as in my application I care little about performance and a lot about durability. I could attempt to manage the queue state within the cluster but I feel more confident entrusting this job to the fantastic piece of engineering that is Postgres. It also adds no additional operational overhead as I don’t need to form a cluster or add a dedicated external message queue (or Redis).
I’ve previously used Redis backed queues extensively and for me they hit an uncomfortable middle ground. They lack the durability guarantees of Postgres (or similar), they require me to deploy and maintain a Redis cluster, and lack the performance potential of working within the cluster.
keathley
I really like Kafka and have used it heavily for a few years now. But getting it running smoothly can be difficult depending on how familiar you are with ops and JVMs. At the end of the day you’re running a stateful service on a bunch of JVMs. So you’ll need a good understanding of how to get metrics out of the jvm and all of your boxes, you’ll wanna tune your jvms, you’ll want to tune your kafka setup, etc. The data in kafka can’t last forever so typically you window the available data for a limited time; generally no more than a month. So you’ll need a way to rotate the logs and shove them into s3 or some other long term storage. On top of all of that you most likely also need to run a zookeeper as well so you have to do all of that same ops work but this time for zookeeper.
If you’re just getting started I recommend that people vendor their kafka setup. Depending on how much scale you need it’ll probably run you somewhere between $500 and $3000 a month. Thats much cheaper then paying for a dedicated ops team so if its something you really need then it’ll be worth the expense IMO. But if you don’t need kafka’s properties (high write throughput and replicated, durable messages) then it may not be worth it.
jeremyjh
I thought I would just mention that we’ve been happy users of honeydew for many months in production now. One of the nice things about honeydew’s Ecto queues is that they can be implemented as just a couple of columns added to an existing table. This way you can be sure a given entity has only one job scheduled to run on it at at time, and you can do things like have the default of a new record to be schedule a job related to it. It is also helpful that job schedule is part of the same transaction as other operations you are doing in your application; so a job is only actually scheduled if the transaction it is a part of goes through.








