GenServer for robust background processing?

BitGonzo · February 17, 2018, 12:07pm

I’m having a little difficulty understanding where GenServer will fit into my planned architecture. I’m used to using background workers (Sidekiq in Ruby) along with persistent stores (Redis), retries, scheduling, etc, etc, and also queuing tech such as RabbitMQ.

From what I read after searching, GenServer is a great for firing off lightweight background processes. But then I found this:

" If something goes wrong, you will not get notified back that processing of message you sent using cast failed. For example, you won’t know that sending email failed since you don’t wait for a reply"

Understandable. But lets say I need these jobs to reprocess if they fail, or need to attach other such logic, is GenServer capable of that, or is the use-case a little more limited? I’ve also been reading into supervisors and fault tolerance, and came upon this:

“In Elixir, supervisors are tasked with restarting processes when they fail. Instead of trying to handle all possible exceptions within a process, the “Let it crash”-philosophy shifts the burden of recovering from such failures to the process’ supervisor.”

Let’s say I have a scraper job that may use an API which rotates proxies (Crawlera, for eg) that can either return a success or too many requests/banned response. Usually, I would have the job re-queue if not successful, and continue to retry, perhaps increasing the increments between retries. Is GenServer and supervisors a good use-case for this kind of logic, or do I need to look into some other solution?

This suggests it might be a good for my use-case, and would be a healthy introduction into OTP:

While this suggests not so, and that perhaps exq, toniq or verk would be a better solution:

https://medium.com/@cschneid/background-jobs-in-elixir-phoenix-60dddf4ce207

Basically, do I have all the tools I need to build durable jobs, retries with exponential backoffs, dynamically scheduled jobs with GenServer/Supervisors/Mnesia?

Qqwy · February 17, 2018, 12:40pm

You basically ask two questions that are somewhat related, but separate.

Let me answer each in turn:

1. Is a GenServer the correct abstraction to use for background jobs and job scheduling?

The answer: Yes! GenServers are the tool to use for background jobs. Or more accurately: GenServers, combined in a supervision tree. The supervision strategy used in this (sub)tree specifies what should happen when a job fails: Retry only this one, retry this one and all others, etc.

GenServers are used a lot in the Elixir and Erlang world to separate the interface-logic (like a webserver, a terminal-reader/writer or a GUI) from the business logic. In stark contrast to most other environments, the parts of the software that are in charge of the business-logic can continue on running even when no connections are happening.
Games and simulations, for instance, have been built using this approach, where a GenServer keeps on invoking itself a couple of times per second, even when no requests from the outside world are happening.

If you want to perform periodic tasks, there are some libraries that make this easier for you (but they themselves use GenServers under the hood!), like quantum.

2. Can I perform incremental backoff strategies using a GenServer?

The answer is: Yes, but the logic might get a little convoluted. There are great libraries that abstract this away from you, however, like retry, which I’d greatly recommend. (I have used it in the past myself in a system running scrapers).

BitGonzo · February 17, 2018, 12:44pm

Interesting, thanks very much.

This is essentially what I’m trying to avoid - reinventing the wheel with unnecessary boilerplate that will not add much value. But, as suggested, perhaps I’ll be able to get there with the help from some simple abstractions.

Considering this is my first venture into the Erlang ecosystem, it sounds like it would be wise to keep it simple while I learn the ropes, and only trade up if required later. I’ll see what I can do with retry, thanks!

dom · February 17, 2018, 1:14pm

Supervisors are for protecting your application against bugs (unexpected conditions). They’re not intended as a generic retry mechanism: they don’t support backoff, and they log crashes as errors, since the expectation is that you’ll want to fix them eventually. You should still use them, just not for the “too many requests” backoff which is part of your application’s expected behaviour and thus should be handled explicitly in the code.

As for GenServers, they’re useful when you need a client/server model. The “Server” in GenServer means a process that loops, receiving requests from clients and acting on them, possibly sending back a response. For instance, let’s say you have a ScrapeManager process that’s responsible for kicking off / managing scraping jobs, and you have other processes (HTTP request handlers) requesting new jobs via that process. Then the HTTP handlers are clients, and the ScrapeManager is a server, and GenServer sounds like a good abstraction to use.

On the other hand, an individual scraping job probably doesn’t behave as a server - it doesn’t loop waiting for client requests, it just runs a specific job to completion. In that case there’s not much benefit to using a GenServer, a plain old process or a Task should do fine.

That’s exactly what RabbitMQ uses under the hood, so I would suggest just using it if you need persistence. Then you get a nice admin UI and a lot of features for free.

mbuhot · February 18, 2018, 6:19am

For durability, you need some kind of storage (mnesia/redis/rabbit/database) as well as a GenServer/GenStage to coordinate.

If you’re already planning to use Ecto/Postgres then I like to just stick the job queue in the database.
This allows simple transactional patterns like signup a new user and enqueue the welcome email job in the same transaction.

BitGonzo · February 18, 2018, 10:20am

Thanks for the alternative approach. Looks much simpler…

What are the downsides to this solution? Firing off emails just one use - the core of my system will be responsible for firing off pingers and scrapers and likely many other jobs based on callbacks. And I’ll be using Postgres for managing long-term state and for backing my CRUD app. Seems pushing all of this behaviours through Postgres might be a bit much, though I could always use another dedicated instance, but then I wonder why I wouldn’t just use a dedicated worker solution.