Hello everyone,
I know we had quite some threads (read through lots of them) about background job processing but it remains a hotly debated topic and something especially people migrating over from other languages (especially the ones with a GIL like Ruby or Python) have questions about. A friendly company wanting to try out elixir just asked me whether or not they need a background job processing system and if yes which one, I couldn’t give them a great answer.
I thought I’d summarize my understanding, I’m not an expert by any means, and kick off some discussion. This is where you come in. I’d love all your input, especially if my understanding is correct and on what libraries/setups you might have used and recommend.
When to reach for a background job tool
“You don’t need background job processing in Elixir/Erlang” This is a sentiment I read a lot especially in the early days of this wonderful forum. I think it’s somewhat of a misunderstanding. No I don’t need background job processing systems just to achieve parallelism.
What can I easily do in parallel?
- I want to do n things in parallel and aggregate results, example: I want to get data from n different data sources (like recommendation engines) and then aggregate them -
Task.async
+Task.await
= - I just want something to happen but don’t care when it finishes - like for instance image processing, some updated caching data…
In summary, probably tasks that likely don’t fail/can be retried immediately and can be done right now, and you can afford to not turn the system off while they are executing. I know hot code upgrades are a thing, but from my understanding if you don’t absolutely need them they’re discouraged for complexity reasons.
What probably needs some more advanced system?
(please correct me if any of these are easily done in just elixir/erlang)
- I want the system to be robust to system restarts/system crashes (which shouldn’t happen right ) - because if these happen then you lose the job that was executing and not done (you can stop the server from shutting down afaik though, which helps for restarts)
- I want to have exponential back off retries - this means that a retry might happen 2 hours in the future, which wouldn’t be feasible to delay the application restarting for so long
- executing jobs in the future at all - a restart would clear these as well so you’d lose them
As a concrete example for what I think I need a background job queue:
I want to notify a partner system of something:
- this needs to be delivered at least once
- partner systems are down rather frequently sometimes for hours, so I want to retry ~5 times with exponential back off but also have the possibility to retry manually after that
I like how the exq README puts it:
If you need a durable jobs, retries with exponential backoffs, dynamically scheduled jobs in the future - that are all able to survive application restarts, then an externally backed queueing library such as Exq could be a good fit.
Existing queue systems
- rihanna - PostgreSQL storage, uses advisory locks
- exq - Redis backed, compatible with Sidekiq format - I like the “do you need exq?” section
- verk - Redis based as well, also supports sidekiq format
- que - backed by Mnesia which is a database builtin to erlang/otp so no extra infrastructure
- toniq - uses redis, hasn’t seen an update in over a year
- honeydew - pluggable storage, featuring in memory, Mnesia and ecto queues.
- ecto_job - backed by PostgreSQL, focussed on transactional behaviour, uses
pg_notify
so doesn’t do any database polling afaik (might be true for others here I just know this) - kiq - a rather new library, also redis backed and aiming at sidekiq compatibility, it was under heavy development around the jump of the year
- faktory_worker_ex - a worker for Mike Perham’s new more server based system faktory - woud especially be interested in opinions/experiences here.
- gen_queue - a generic interface to different queue systems mentioned above and others for flexibility
What I find interesting is that our forum discussions are often very focussed on how we can do it just in the BEAM, which I quite like - but we have comparatively little libraries that implement it BEAM/OTP only. Part because people have problems with mnesia. Something that I found in in the discussions but apparently no library to go along with it is using dets for storage.
other things
I find this exchange between @benwilson512 and @sasajuric very interesting
Also in the discussion of course gen_stage comes up, for processing large amounts of data.
Discussion Points
- Are there more things that we should do only in the BEAM/OTP?
- What are other scenarios where we should reach for a background job processing system?
- What library or setup can you recommend?