You may not need GenServers and supervision trees

PragTob · March 7, 2018, 10:37am

Hey everyone,

this has been brewing in my head some time and it came up again while reading Adopting Elixir.

GenServers, supervisors etc. are great technologies that help you solve problems. They’re one of the things that is most special about elixir/Erlang. As a result lots of conference talks, blog posts etc. focus on them and it seems everyone wants to use them.

However, do you need them all the time? At least while using a framework (like Phoenix), chances are you don’t. Of course, until you got a problem that these help you solve.

Building a relatively standard CRUD web application with Phoenix? No need.
Just using channels for a chat like applications in Phoenix? You’re good.

The hidden detail of course is that you are using GenServers and friends without even knowing it - Phoenix runs every request and every channel in their own processes. Ecto uses poolboy for your database connections. It’s already parallelized and you don’t need to take care of it. That’s the beauty of it. What I’m saying is that in the standard situation the eco system takes care of you.

Why am I picking up this topic?
It feels like we talk so much about GenServers etc. that people who come to Elixir feel like they need to use them or they are not “really” using Elixir. I hear people say something to the tune of “We’re still using this like Rails - we should use GenServers” - without any need (granted they mostly don’t know what Phoenix & friends already do under the hood). At worst (as I’ve seen in some questions here) people create a single GenServer that then essentially all traffic needs to go through complicating their code while also adding an unneeded bottleneck. Maybe they just complicate their code, that’s also bad.

To get back to “Adopting Elixir” an example from it:

A new developer team started building their Phoenix applications.
They had always heard GenServers could be treated like microservices but even
tinier. This “wisdom” led them to push all of their database access control to
GenServers .
(…)
performance was abysmal. Under high-enough load, some pages took 3 sec-
onds to render because they built a bottleneck where none existed. They
defeated Ecto connection pools because all access happened through a single
process.
In essence, they made it easy to create global, mutable variables in Elixir. They
essentially crippled the single biggest advantage of functional languages, for
no gain whatsoever.

Which is also what I’ve seen around a bunch of times. The book also provides some guidance as to what to best use GenServers for:

Model state accessed by multiple processes.

Run multiple tasks concurrently.

Gracefully handle clean startup and exit concerns.

Communicate between servers

So, what do I want in the end?

Well, I want to discuss with you all about this and hear your opinions!

I think we should make it clearer that you don’t have to use GenServers and that doing so might actually be harmful. My 2 production applications include no single GenServer written by us. They run fine. In general the eco system takes good care of you so you’re using them without realizing it (which is good imo).

I’m not saying you shouldn’t learn about GenServers. You should. But know when to use them and when not to.

Lastly, if you disagree I want you to scream at me and teach me the error in my ways

jordiee · March 7, 2018, 2:55pm

I agree with you that for the most party you don’t need to use any otp stuff in servers but people should also understand some of the awesome things you can do with it. Having an distributed cache(mnesia), tracking state real time without the need to hit a database in genservers. Also depending on the structure of your code you don’t “have” to take a performance hit. Running everything through a genserver…sure its going to be quite slow. spinning up a new genserver per “user”…your performance is probably still going to be pretty good and now you can track things in memory for users. Again I agree that you, 99% of the time you do not need to use anything otp related but when you are in that 1% its really a game changer that makes elixir as great as it is.

I for example heavily use otp things(mnesia,ets,supervisors,genservers) in a saas product I am working on and it has got to the point where I do not think I could get the same performance out of the box with any other language.

PragTob · March 7, 2018, 3:01pm

Thanks for your response! Totally agree, it can be a total game changer with what you’re able to do with it and I myself still have to get acquainted with large parts of OTP. Knowing that all of this has proven reliability and performance in large system is also very comforting

Oh definitely - didn’t want to come across as saying they’re bad for performance. On the contrary, I usually think of them as improving performance (by parallelization). The fact that they’re so incredibly cheap to spin up is just sprinkles on the cake!

yurko · March 7, 2018, 3:15pm

I think that’s “kinda” true You sure don’t have to use them for the sake of using them, but the only way to feel their awesomeness is to actually use them, otherwise you’ll just never get the feeling of needing to use them (if that makes sense).

We have a bunch of apps and only the first one, a tiny microservice actually has no GenServers. Even things that needed cronjobs / sidekiq profit from them and these are not exotic features.

sync08 · March 7, 2018, 3:35pm

I almost wrote about this not long ago but while writing it out I came to some unfortunate conclusions. I’ll just cut right to the chase and say if you’re not using Elixir for the genservers, supervisors e.t.c. then it’s probably not worth using the language.

As a language, I love Elixir. It has made some compromises but nothing major. The only real issue I have with it is that it’s dynamically typed. I’m pretty sure I linked @PragTob’s blog post in a thread the other day talking about this.

As much as i wish I could just use the language and forget about complex supervision trees e.t.c. it just doesn’t make sense. It’s not that fast, it’s not that safe, it’s not light on system resources, it’s editor tooling isn’t great, it’s not suitable for scripting, it’s not suitable for GUIs and most importantly it’s growth has peaked. The only thing it currently makes sense to use Elixir for is fault-tolerant server side apps.

cpgo · March 7, 2018, 3:44pm

If I understood @PragTob point is that if you are using something like Phoenix or Ecto you are already taking advantage of the faul-tolerant side of elixir and you dont need to try and shove otp stuff where you dont need.

As a very bad comparison, I can use Rails without any Ruby metaprogramming black magic on my code but still Rails is using it (very) heavily for me.

sync08 · March 7, 2018, 3:58pm

While I understand that, having just Phoenix and Ecto be fault tolerant buys you nothing. Any modern web framework (regardless of language) can handle bad requests without bringing down the server.

My point is that if you’re not using the built in supervision tools then there is pretty much no benefit to using Elixir but many downsides.

jordiee · March 7, 2018, 4:00pm

Not completely true, You at least get a language that runs across cores pretty nicely out of the box. Not saying other languages can’t do that but its really nice.

amnu3387 · March 7, 2018, 4:18pm

So what are the many downsides to it?

Previously you mentioned “is not that fast/light/safe”, which can be true about anything else as well, it basically depends on what you compare it against, nothing else. It can also be much faster/lighter/safer. So what are your points of comparison?

I personally feel very happy writing in it - I just wrote a queue system for a game in half a bunch of lines, split between genservers and ets tables…

edmz · March 7, 2018, 4:33pm

I agree with your feeling and I would like to extend this to processes too. They say that when you are a hammer everything looks like a nail. So I see people coming to elixir becoming hammers and trying to use process for everything and getting frustrated in the process (ha!, no pun intended).

Its like getting into ruby and somehow convincing/forcing yourself to use threads everywhere when in real life you would’nt really be using them that much.

I think its an issue of communication.

sync08 · March 7, 2018, 4:37pm

Yeah but I get that with many languages now. I’m still half in the C# world and despite the hype about Erlang/Elixir multi-core usage, it’s actually not that great at it. I’ve done some benchmarks and saturating all cores caps out at about 75% CPU usage. In C# I can get almost 100% usage. Even if I could push Elixir to utilize all 100% (you can’t, it’s fundamental BEAM overhead) you still won’t see the performance come anywhere close. I’m talking orders of magnitude difference in most tasks.

What I will say is that concurrency in Elixir is way easier and safer.

JEG2 · March 7, 2018, 4:46pm

Is it possible for you to show an example of code that C# parallelizes so much better?

sync08 · March 7, 2018, 4:59pm

If we’re sticking with the topic of this thread then the most basic and relevant demo would be to compare a newly created ASP.NET Core web app and Phoenix app.

I used wrk/ab to saturate the connection and then profiled the CPU cores. You’ll get pretty much what I mentioned above. What’s interesting is that I took down the BEAM way more times doing these benchmarks.

sync08 · March 7, 2018, 5:12pm

I think I covered most of the downsides in my post above and I agree that those could apply to some other languages too. I guess what I’m trying to get across is that Elixir is, for better or worse, created on the BEAM. This pretty much limits it for one specific use case. No matter how great the language may be, it just doesn’t make sense to use it for the language alone.

amnu3387 · March 7, 2018, 8:08pm

I’m not trying to be funny, but, “most of the downsides” seem to be a single one - and although you might be completely correct about it, I don’t see any proof of what you’re claiming. A test case that would be replicable would be a great start.

(and I personally disagree that the language alone can’t be a strong enough reason to use it - I went and looked at how you would do a controller and a webpage in ASP.net and I got syphilis out of it - but that’s my personal opinion)

rvirding · March 7, 2018, 8:21pm

I think the most important thing to understand and to use properly is the concurrency in the problem/solution/system. Using GenServers and other behaviours is just one way of doing the concurrency but it is not the only way. They are tools and like all tools they need to be used in the right way. The problem is to get the right level of concurrency which suites your problem and your solution to that problem. Too much concurrency means you will be doing excess work to no real gain, and too little concurrency means you will be making your system too sequential.

Now, as has been pointed out, many packages like Phoenix already provide a pretty decent level of concurrency which is suitable for many types of applications, at least the ones they were intended for. They will do this automatically so you don’t have to think about it in most cases, but it is still there. Understanding that is necessary so you can work out how much concurrency you need to explicitly add if any. Unfortunately because it is all managed for you “invisibly underneath” many don’t realise that is it there.

cmkarlsson · March 7, 2018, 8:28pm

Because this is an elixir forum I’ll come to the rescue of the BEAM and counter some of your arguments

If you can’t saturate nearly 100% there is something wrong with the system somewhere. I.e you have a GenServer bottleneck, IO bottlebeck, NIF/BIF bottleneck somewhere. The BEAM overhead is not that much

The order of magnitude can be correct, but then we should be talking CPU expensive tasks which have not been correctly off-loaded to a port/NIF or some micro-benchmarking. From my experience working in go, java and erlang I get pretty comparable numbers on real world applications.

Yes, erlang is slightly slower than the other two, but we are talking 10-20% (sometimes up to 50%) here but not order of magnitudes. And I’ve had bottlenecks in the other languages too making them not being able to utilize 100% CPU something especially go should be good at.

If you are stress-testing and overloading the SUT this is my experience too in the first iterations. When stress-testing there is always some component that can’t handle it and in the BEAM this may lead to rapid restarts of the supervision trees and crash of the runtime. Java seems to stay up longer but in practice is not doing much useful work at those loads. For the BEAM you can usually find these places and put up guards around it to make sure the traffic is dropped (for example) before reaching those parts. Any system or runtime will have these problems when overloaded for periods of time.

On the other hand, in practice, if you put nearly 95% load on the system, what I see is that the BEAM gives you much more consistent latency, especially compared to java.

I agree if you look at the basic web system, the BEAMs fault tolerance doesn’t give you much advantage. This is because HTTP is stateless, whereas BEAM is designed for stateful applications.

However a system is more than that. Database servers, message queues, notification servers, statistics collecting, communication with other external systems. anything that requires some sort of state and the BEAM is so much easier to work with, and if one of those parts crashes it doesn’t affect anything else in the system. Especially now when web-sockets and stateful connections are becoming more prevalent BEAM languages has a big advantage. It makes it much easier to isolate and write robust components in erlang/elixir (which perhaps is your point)

For the thread in general. I came to erlang from java and python and I also could not initially see the advantages or how to work with the BEAM to make the most out of it. I used processes and gen_servers and similar just for the sake of it usually with bad and results and awkward code. I think my problem was that I looked at things the wrong way. I had this amazing tool in the BEAM and I was trying to apply it everywhere. Therefore I think the original poster is correct. You may not need GenServers and supervision trees and you should not try to force the BEAM tooling onto a problem just for the sake of it.

Instead you should get as much information, read as much material, and practice to write systems in OTP as much as possible. Then you will see where it is needed and how it can be applied. I’ve also noticed in the elixir community a much larger willingness to use external libraries than in for example erlang (perhaps because there aren’t many libraries there ). These external libraries make use of OTP in the best way and all you need to do is glue these components together. You get all the benefits of BEAM without doing things yourself. The risk is, if you don’t understand the tooling you don’t know what trade-offs you are making, you don’t know if a 3rd party library is well designed and many times the 3rd party library is not needed at all. We learn all the time and as you progress it will be easier to see these things.

thinkpadder1 · March 7, 2018, 9:31pm

Do you use both mnesia and ets primarily for caching? Or something else?

jordiee · March 7, 2018, 10:07pm

To be honest I started with ets for all caching but am still in the process of converting everything over to mnesia so that I can run it a distributed way. But yes I use them always to store and access data that I can’t reach to a db for due to speed constants(like token authorization)

sasajuric · March 7, 2018, 10:54pm

I pretty much agree with what you wrote. That said, I think that GenServer/supervision trees are the pieces people should learn about, because in my experience they are great solutions in many cases, and I’ve yet to see a production which didn’t need a GenServer nor some form of supervision tree fairly early on in the game.

With a lot of hand waving, I’d say that GenServers are OTPs built-in building block for building responsive services, Tasks are the same for non-responsive ones, and supervision tree is the built-in service manager like systemd or upstart. In the past 10+ years of my backend side experience, I’ve worked on small to medium systems, and all of them needed all of these technical approaches.

So I guess my point is that while OTP abstractions can be misused (and they frequently are), they are also very useful, and in my experience very frequently needed. I’ve tried to provide some examples of both functional and concurrent design in my To spawn or not to spawn? article. In particular, in that fairly simple example I already use a couple of GenServers and Supervisors to separate the runtime activities, and I don’t think it’s overengineered.

But I ultimately agree with you that with the ecosystem evolving, there’s less need to write GenServers ourselves, since many common cases can be covered by 3rd party libraries, such as Phoenix, Ecto and others.