Background jobs and deployment

daniel_torres · November 27, 2016, 10:47am

Hi there,

I’m currently investigating Elixir and Phoenix as an alternative to Ruby/Rails for a new project I’m working on.

It’s mainly an API backend and a mobile frontend, with websocket connections between the two, but most data is going through the JSON API.

However, learning more and more Elixir and OTP, I realised we could use Tasks and GenServers for “background jobs” instead of a full fledged job queue as you’d normally do with Rails. Background jobs in our case being things like:

push notifications send to the mobile clients
emails
after-save calculations for our algorithms
image processing
search indexing

Now to the real question(s): say we’d use GenServers (potentially together with ETS/Mnesia) to do those things. If we deploy using erlang/elixir releases, the state in the GenServers should persists as far as I understand it, so we shouldn’t lose any unprocessed jobs. Correct?

However, if we deploy using throwaway docker containers, those unprocessed jobs would be gone after deploys.

Long story short, how would you guys handle such a thing? The aforementioned jobs are not super critical (a missed push notification is not the end of the world), but we’d certainly prefer a stable solution. Docker would also be nice from a DevOps perspective, that’s why I’m asking.

We could of course use an external queue with redis/rabbit, but that doesn’t feel like the erlang way, does it?

Hope that’s somewhat clear - thanks in advance!

shavit · November 27, 2016, 12:35pm

One solution is not to use background jobs in Elixir / Phoenix.

Let’s say you have a program that runs from a machine on a private network, then you can write a worker that subscribe to a job queue using Redis, and create jobs from the front end with Elixir / Phoenix.

If you don’t want to install another services like RabbitMQ or Redis, you can still use PostgreSQL or MongoDB with capped collections.

gon782 · November 27, 2016, 12:45pm

You can still write to disk and read from disk no matter what you do. If you don’t only use ram_copies of your mnesia tables (and instead set both ram_copies and disk_copies) it’ll both read and write from memory and persist to disk. Also, you can make it so that any gen_servers persist to disk on terminate().

Qqwy · November 27, 2016, 12:45pm

I think that the Erlang/Elixir way would be to decide what information only needs to be Ephemeral (it not being a problem when it gets lost when restarting the application), and what information should be stored persistently. In other languages, there often are no good ways to work with information in an ephemeral manner, so people end up always storing everything, which results in ginormous databases and slower applications.

The persistent information can be stored without leaving the BEAM using, for instance, DETS. Another possibility would be to inform your application that you are going to restart it, and ask the GenServers that handle important information to write a backup of their current state to a file at that time.

Interesting side information is that RabbitMQ itself has been built with Erlang

daniel_torres · November 27, 2016, 3:22pm

Alright, sounds like persisting to disk is the best option here, apart from using a dedicated queue with redis/rabbit.

Thanks for all the answers!

Qqwy · November 27, 2016, 8:32pm

By the way, it probably is very possible to create a queue inside Elixir using the new GenStage/Flow. One possibility here would be to have a ProducerConsumer that will store its contents to the disk when its subscribers are not yet ready for the new requests.