Background job processing in Elixir

sheharyarn · October 21, 2016, 8:57pm

I’m working on a Phoenix app that supposed to offload jobs (that scrape websites and collect data) on a background worker thread. A quick google search reveals 3 libraries:

All three use Redis at the back but Exq seems to be the most popular one, and I like the fact that it’s in Sidekiq format (can be used with the Sidekiq UI). I would like to get the community’s opinion on this. Which one are you using and why? Can someone weigh in on the Pros and Cons of each library?

Also, how hard would it be to implement a simple background job worker in elixir without relying on external libraries (or even redis for that matter)? Is that a good decision?

gregvaughn · October 21, 2016, 9:28pm

I think you came around to the answer at the end there. IMHO it takes pretty specific requirements to make me even consider going outside beam for concurrent processing.

That said I’ve heard of verk and IIRC it also supports sidekiq format. That situation is really only useful if you have a ruby app creating background jobs to be processed in elixir.

benwilson512 · October 22, 2016, 7:05pm

sasajuric · October 22, 2016, 8:51pm

This is a great analysis, but I’ll nitpick on a couple of points.

Persisting the queue to disk directly from Erlang is straightforward using DETS or disk-based Mnesia table. I used the latter approach in my first system and had no problems with it.

If Redis solves it by not being distributed, so can Erlang A 3rd party non-distributed approach is not the simplest solution to a distributed problem. An in-tech (in this case pure Erlang or Elixir) solution could be simpler

I didn’t use Redis for many years, but IIRC it actually has distributed support. However, that didn’t fare well in tests made by aphyr in his Jepsen series (disclaimer: they were taken a few years ago, so maybe Redis improved since then). My takeaway from reading those was that Redis offers no strong guarantees in distributed setting. I can easily build my own in-Erlang/Elixir distributed solution that offers no real guarantees

Therefore, I don’t see compelling use case for Redis and wouldn’t recommend using it. However, some other 3rd party solution might be fine. Assuming it actually delivers on its promise, it can certainly be compelling, since unfortunately I’m not aware of currently available high-level-easy-to-use-partition-tolerant distributed abstractions in Erlang. While there are some libraries, most notably Riak Core and Riak Ensemble, the usage is not straightforward, and you need to use some forks to make them work with the most recent Erlang.

I hope this will improve with time. Phoenix Presence is a great example of a distributed abstraction built using well understood algorithms, and at the same time it seems easy to use and well documented. I hope we’ll see more of such libraries in the future.

benwilson512 · October 24, 2016, 4:46pm

These are great points and I’ve very excited you’ve brought them up. I’m in no way disputing that Erlang can do this, nor even that Erlang can make it easy to do this. My contention however is that for a large class of Elixir users who are relatively new to some of these approaches, there are not yet the libraries that make using Elixir for this purpose as easy as using Redis.

While each of the erlang solutions in and of themselves are built upon proven technologies, using these technologies nonetheless offers various opportunities to shoot ones self in the foot, and it’s a fear of this that I think drives people to Redis.

Specifically:

Persisting the queue to disk directly from Erlang is straightforward using DETS or disk-based Mnesia table. I used the latter approach in my first system and had no problems with it.

Both of these suffer from distributed systems concerns. Mnesia is the closest to offering out of the box way to keep state synchronized across the nodes, but handling netsplit or other pathological cases with mnesia is non trivial. Mnesia also comes with a number of unexpected foot guns IE, async loading of tables by default.

Persisting to disk from a genserver indeed can be straight forward but keeping this state synchronized across a cluster is far from straight forward.

If Redis solves it by not being distributed, so can Erlang

This 100%. However, there isn’t a “go install this lib and call this function and you’re done” level of solution at the moment. More to the point, there are a few possibilities WRT what making a particular service in an erlang cluster “not distributed” means and each of them can pose some challenges for new people.

One of the application servers is chosen as special and only it runs the data store in addition to regular code. This presents dev ops challenges because if we need to migrate our app servers for some reason the state on one of them matters but not the state on others. There’s also reliability concerns because bugs in the application code (excessive memory usage) can more easily take out the KV store.
A third server is setup and it runs ONLY the KV store. This is probably the closest to the redis answer. Challenges here basically just amount to having the kind of deployment tooling required to have an erlang cluster running with different applications on different servers. This is getting better.

In that last scenario, would mnesia with disk copies on only this third node work? Is it susceptible to netsplit?

Recap:

We need:

Easy to use library. Phoenix Presence is indeed a very good example here.
Appropriate deployment practices to produce the frequently desired separation of canonical state and application code.

sasajuric · October 24, 2016, 9:35pm

I was mostly suggesting that non-distributed local-node caching is simple to do with Erlang. If you don’t need to distribute the state, then running a background job processing (which was the original problem of this thread) is as easy as starting a process from the request handler

My feeling is that people go to Redis because they used it before, everyone else uses it, and it’s seductively simple. Many, though, seem to disregard the fact that this thing has to be set up and configured somewhere, and it’s either a single point of failure (non-distributed), or otherwise unreliable.

A non-distributed in-memory cache can be as easy as ETS table, or even an Agent for smaller throughputs. Making it persistent can be easily done with DETS or non-distributed mnesia.

When it comes to managing a distributed state, the proper solution is not going to be simple with Erlang, but I don’t think distributed is ever simple or easy. Minimizing a cluster-wide state would be the first thing I’d consider. Otherwise, I’d look at libraries such as Phoenix Presence, riak_core, riak_ensemble, swarm, or syn, depending on the particular case.

Of course, reaching for 3rd party external components and databases is always a reasonable option, especially since using them can be simpler than evaluating Elixir/Erlang libraries and setting everything up properly in the code. Having a database as the single source of truth will work, but it will also be a single point of failure and a possible bottleneck. If you want to scale it, the db has to become distributed, which will then lead to similar challenges.