Is GenServer used in real life applications? If so, how?

amnu3387 · June 29, 2020, 9:58am

I use regularly gen_server’s and gen_statem’s but for most “regular” stuff you would do in a normal web app, as other’s pointed out, you might not need to because most of the internal architecture of what you’ll be using already uses processes to model their behaviour/functionality (say cowboy/phoenix, ecto/db connections, oban, etc), that basically you don’t need to worry about them in those cases.

But I find plenty of uses for them and some of the places where I’ve used them:

modelling complex, data intensive aggregations spanning millions of rows and producing literally millions of records holding aggregations over arbitrary timeframes and conditions (overlapping timeframes, etc) - no, pure SQL would be a nightmare, not only to model and use, but to maintain - organising the synching from external datastores, concurrently while controlling the concurrency, converting it, storing it, then querying that data and organising it, taking into account that things need to happen in a defined set of steps (B’s only start after all of A’s have finished synching, C’s can start as soon as their B’s counterparts are finished, while others just have their own lifecycle but it all needs to tie into a single “flow”) - I won’t say “trivial” because the problem itself was far from trivial, but in terms of logic it was very simple, explainable in a diagram, testable, and it was basically genservers starting and monitoring others, batching things from the db and calculating that, storing it and exiting and when those finished moving to the next ones. This while guaranteeing that db timeouts/crashes and etc wouldn’t throw out hours of aggregations and restart the whole thing from the start.
Using gen_statem/servers to model fetching data from external api’s - say fb’s api, or whatever have you. Again allowing controlled concurrency and/or rate limiting. Start X, as they finish start another, until you’ve gone through them all, then schedule another cycle. Each individual one is a sequence of steps - check you have a valid token, if not request a new token, substitute it, now request the data, update what you need, move to the next one, if you can’t get a token, warn the user somehow, etc.
In a game I’m (still) re-writing, each game, each draft, etc, are all single processes, either gen_servers or statems, on the game it’s a turn based game, so it receives commands and processes them guaranteeing they’re allowed (correct player, correct moves, enough resources, etc) and then broadcasts them back to the players and dumps it into a db. The draft is 8 players at the same time, each player starts with a pool of choices, each pick has a timer, after each pick that “pool” moves one to the next player, and keeps going until all pools are empty, it’s all concurrently but you can only pick when you have a pool, and after picking one once the player “behind” you picks it’s own and moves their pool to you. Again everything is separated, each game, draft, etc can do its own things (dumping to db as a safety measure, broadcasting, etc) without interfering with others.

There’s some other cases where I’ve used them, but basically it comes down to concurrency control, need of access serialization. Say you’re linking an account to something external (stripe or paypal, wtv), if you make it pass through a process (individual to each user) that acts as a serialization point and 2 (or more) requests from the same user come in at the same time (even someone trying to poke holes in your system), you can model it easily in a way that each request will be dealt with only one after the other, and when the second one gets to be processed you can bail out immediately because you already have the result from the first one, without having to rely on ad-hoc locks or wtv, as long as the interactions with that particular resource are modeled through it. Then this process can self shutdown after X minutes of inactivity and will start&load whenever a new request comes in, guaranteeing that it’s always consistent.

hubertlepicki · June 29, 2020, 10:17am

I am working on a small series of blog posts whare we will go into the weeds with GenServers and when to use them (and when not).

Basically my thinking on the subject is that these are great building blocks for your application infrastructure. If you have to build a piece of infrastructure for your app then you probably will need some. Think: if you wnat to build a connection pool, pool of workers or a background job processing library from scratch you would use a combination of supervisors and genservers and / or with mixture of Tasks, Agents or some higher level abstractions.

And you often times do have to build your own piece of infrastructure like that.

Where GenServers are generally not useful is when implementing business logic for your application. Again, there are exceptions to this rule, but for 99% of the cases when you write your business logic you just write functions in some modules.

Having said that, you may want to find yourself in a need to write a piece of infrastructure that wraps your business logic, say in a saga pattern, handling retries and multiple success/failure paths and dependencies. If you want to build something like that into your app, OTP primitives will be exceptionally helpful.

hauleth · June 29, 2020, 10:21am

I would say that in web apps you do not use gen_servers directly, because a lot of the tools you have written are using gen_servers, so indirectly the use of them is enormous, just there is no need for “directly using them”.

sasajuric · June 29, 2020, 10:45am

This example is more academic I originally planned to have a more realistic example based on a real project I was working on at a time (live betting domain), but I soon found out that it would require much more pages and make the read significantly denser, because in addition to treating a large topic of concurrency with OTP, a lot of space, and consequently reader’s energy, would have to be spent on grasping the problem and its nuances. So for better or worse I opted for this toy example, which I consider as “barely relatable”, in the sense that it’s not completely unrelatable (i.e. there are no foos and bars, or mammals and dinosaurs), but it’s not really realistic. IRL, I’d typically model this by using a regular database, and if needed add a cache layer powered by GenServer and ETS (there are a few libraries for this).

In general I do agree that in most cases you can get away without directly using GenServer, because they are wrapped by libraries from the ecosystem. Examples include Supervisor (internally implemented using GenServer), Phoenix channels, Oban persistent queues, periodic job schedulers, caches, rate limiters & load regulators, pools, etc.

That said, I think that GenServer is very useful to learn because it’s quite versatile, and IMO it’s not very difficult to understand. Once you know it, you get a very useful tool in your toolbox that can simplify your life when the existing abstractions from the ecosystem don’t quite cover your particular needs.

This is how I think about it too! It’s a bit simplistic comparison, but I agree that in many non-BEAM languages you’d more often need to reach for microservices where in BEAM language a GenServer would suffice.

This is certainly the problem with mainstream microservices, because they are integrated at the OS level, and such integration becomes quite complex as the number of services grows. However, with GenServers you do the split at the code level, with everything managed in a single monolithic project, using the same piece of technology (same language & libraries), so day-to-day dev, testing, and operations is much easier and completely manageable. I actually posted an article on this topic last week. It’s not directly talking about GenServer, but the library I’m talking about relies a lot on GenServers.

So in general, you shouldn’t be scared of running many GenServer processes, or processes in general. Even the default app generated by Phoenix uses more than 100 processes by default (fire up the observer to visualize it). Of course, pay attention to use processes properly. I’ve wrote a bit about it here.

axelson · June 29, 2020, 7:11pm

Yes, they are. But generally you’d only use GenServer’s (and processes) for runtime error separation and NOT code organization (which is something that “microservices” are sometimes used for). If you haven’t already read this I would highly recommend reading Sasa’s excellent “To Spawn or not to spawn?”. While it is more about processes than about GenServer’s it will help solidify your thinking in this area.

ityonemo · June 29, 2020, 8:18pm

So in my job, I orchestrate virtual machines living on hardware. Think, vmware or EC2, except much much simpler. I use gen_statems (like gen_servers, but with an additional state machine) to model the hosts and the virtual machines. The hosts are permanent gen_statems, and the vms are transient gen_statems. So for example, if a user makes a request, the gen_statem receives the transaction request, locks the system to prevent user request contention, waits to be released by the requester (or picked), then stays locked until the virtual machine successfully boots. The host also, out-of-band, periodically queries the actual physical host to find out what it looks like and updates its state accordingly. This is very easy to write in the BEAM (I have written concurrent code in C++ actors and Go goroutines, and the BEAM is way better). I suppose with a lot of boilerplate and much more latency and difficulty you could do this with a database, but I don’t have a database in my datacenter, for now.

Another thing worth mentioning is that the BEAM lets you model failure domains in a sane fashion, so, let’s say my connection to check up on the host dies. In that case, the host gen_statem AND all of its vm gen_statems are also killed and don’t come back until the host is contactable. This prevents users from issuing inconsistent, conflicting requests to those components, and I didn’t have to write complex logic to handle those cases and I can sleep at night.

Will-W · July 3, 2020, 4:23pm

I think of Erlang/Elixir as fundamentally two things:

A great (mostly pure) functional programming language
A great implementation of the actor programming model (this is the messages and processes bit)

The Actor Model has a wikipedia page, but it’s got a lot of Computer Sciencey terms. What it boils down to is this:
Break your program up into a series of small boxes (Actors). Each Actor can have its own state, but the only thing it can do to affect other actors is send them messages.
This turns out to be a great way of building a big complicated program. You don’t have to understand the whole thing, you can understand each actor one at a time because they operate independently of each other.

We can build a lot of code with pure (side effect free) functional programming. But in most applications, somewhere you need to have ‘side effects’. For example, if you want to count how many times something is called you can’t do that with a pure function call - the output can only depend on the inputs, so you’d get the same answer back every time.
Instead we create an actor. It has a little bit of internal state (‘count’), and it recognises one message ‘get count’. Every time it sees that message it sends back a message with the count value from its state, and increments that state by one. The next time you make the same call, you get a different answer.

GenServers are the normal abstraction that we use to build such actors. They handle all the fiddly bits to allow us to do the following standard sequence:

Receive message
Send Reply
Update state

The other reason you might create a GenServer is when you need to do work in parallel - maybe a long running or periodic job, or because you want to handle lots of network connections at the same time.

Every application needs some GenServers to hold state or work in parallel, but as others have said that doesn’t mean that you are necessarily writing them. If you write a Phoenix web app, then the Process/Genserver setup is done by the web framework. You are just writing some pure functional code that runs in that context to do what boils down to producing an updated copy of the Plug.Conn struct (and other state is stored in the DB).

On the other hand, if your webserver needed to periodically update some internal information to then present (e.g. pulling weather data from another service), then you might create a GenServer to perform that task once an hour.

Nicd · July 3, 2020, 9:22pm

Just to add my two cents, I regularly use GenServers in web services to manage ETS caches for example. I just wrote one to serialize accesses to an external API (Gravatar) and cache the results in ETS: https://gitlab.com/code-stats/code-stats/-/blob/050351ec17612549035faf2d60e6324846410542/lib/code_stats_web/gravatar/proxy.ex

dimitarvp · July 3, 2020, 9:39pm

Pretty good code by the way. Well done.