You may not need GenServers and supervision trees

I’m not trying to be funny, but, “most of the downsides” seem to be a single one - and although you might be completely correct about it, I don’t see any proof of what you’re claiming. A test case that would be replicable would be a great start.

(and I personally disagree that the language alone can’t be a strong enough reason to use it - I went and looked at how you would do a controller and a webpage in ASP.net and I got syphilis out of it - but that’s my personal opinion)

2 Likes

I think the most important thing to understand and to use properly is the concurrency in the problem/solution/system. Using GenServers and other behaviours is just one way of doing the concurrency but it is not the only way. They are tools and like all tools they need to be used in the right way. The problem is to get the right level of concurrency which suites your problem and your solution to that problem. Too much concurrency means you will be doing excess work to no real gain, and too little concurrency means you will be making your system too sequential.

Now, as has been pointed out, many packages like Phoenix already provide a pretty decent level of concurrency which is suitable for many types of applications, at least the ones they were intended for. They will do this automatically so you don’t have to think about it in most cases, but it is still there. Understanding that is necessary so you can work out how much concurrency you need to explicitly add if any. Unfortunately because it is all managed for you “invisibly underneath” many don’t realise that is it there.

19 Likes

Because this is an elixir forum I’ll come to the rescue of the BEAM and counter some of your arguments :smiley:

If you can’t saturate nearly 100% there is something wrong with the system somewhere. I.e you have a GenServer bottleneck, IO bottlebeck, NIF/BIF bottleneck somewhere. The BEAM overhead is not that much

The order of magnitude can be correct, but then we should be talking CPU expensive tasks which have not been correctly off-loaded to a port/NIF or some micro-benchmarking. From my experience working in go, java and erlang I get pretty comparable numbers on real world applications.

Yes, erlang is slightly slower than the other two, but we are talking 10-20% (sometimes up to 50%) here but not order of magnitudes. And I’ve had bottlenecks in the other languages too making them not being able to utilize 100% CPU something especially go should be good at.

If you are stress-testing and overloading the SUT this is my experience too in the first iterations. When stress-testing there is always some component that can’t handle it and in the BEAM this may lead to rapid restarts of the supervision trees and crash of the runtime. Java seems to stay up longer but in practice is not doing much useful work at those loads. For the BEAM you can usually find these places and put up guards around it to make sure the traffic is dropped (for example) before reaching those parts. Any system or runtime will have these problems when overloaded for periods of time.

On the other hand, in practice, if you put nearly 95% load on the system, what I see is that the BEAM gives you much more consistent latency, especially compared to java.

I agree if you look at the basic web system, the BEAMs fault tolerance doesn’t give you much advantage. This is because HTTP is stateless, whereas BEAM is designed for stateful applications.

However a system is more than that. Database servers, message queues, notification servers, statistics collecting, communication with other external systems. anything that requires some sort of state and the BEAM is so much easier to work with, and if one of those parts crashes it doesn’t affect anything else in the system. Especially now when web-sockets and stateful connections are becoming more prevalent BEAM languages has a big advantage. It makes it much easier to isolate and write robust components in erlang/elixir (which perhaps is your point)

For the thread in general. I came to erlang from java and python and I also could not initially see the advantages or how to work with the BEAM to make the most out of it. I used processes and gen_servers and similar just for the sake of it usually with bad and results and awkward code. I think my problem was that I looked at things the wrong way. I had this amazing tool in the BEAM and I was trying to apply it everywhere. Therefore I think the original poster is correct. You may not need GenServers and supervision trees and you should not try to force the BEAM tooling onto a problem just for the sake of it.

Instead you should get as much information, read as much material, and practice to write systems in OTP as much as possible. Then you will see where it is needed and how it can be applied. I’ve also noticed in the elixir community a much larger willingness to use external libraries than in for example erlang (perhaps because there aren’t many libraries there :wink: ). These external libraries make use of OTP in the best way and all you need to do is glue these components together. You get all the benefits of BEAM without doing things yourself. The risk is, if you don’t understand the tooling you don’t know what trade-offs you are making, you don’t know if a 3rd party library is well designed and many times the 3rd party library is not needed at all. We learn all the time and as you progress it will be easier to see these things.

21 Likes

Do you use both mnesia and ets primarily for caching? Or something else?

To be honest I started with ets for all caching but am still in the process of converting everything over to mnesia so that I can run it a distributed way. But yes I use them always to store and access data that I can’t reach to a db for due to speed constants(like token authorization)

I pretty much agree with what you wrote. That said, I think that GenServer/supervision trees are the pieces people should learn about, because in my experience they are great solutions in many cases, and I’ve yet to see a production which didn’t need a GenServer nor some form of supervision tree fairly early on in the game.

With a lot of hand waving, I’d say that GenServers are OTPs built-in building block for building responsive services, Tasks are the same for non-responsive ones, and supervision tree is the built-in service manager like systemd or upstart. In the past 10+ years of my backend side experience, I’ve worked on small to medium systems, and all of them needed all of these technical approaches.

So I guess my point is that while OTP abstractions can be misused (and they frequently are), they are also very useful, and in my experience very frequently needed. I’ve tried to provide some examples of both functional and concurrent design in my To spawn or not to spawn? article. In particular, in that fairly simple example I already use a couple of GenServers and Supervisors to separate the runtime activities, and I don’t think it’s overengineered.

But I ultimately agree with you that with the ecosystem evolving, there’s less need to write GenServers ourselves, since many common cases can be covered by 3rd party libraries, such as Phoenix, Ecto and others.

21 Likes

This makes me wonder who promised you a 100% saturated CPU with the BEAM? It’s a very common knowledge that Erlang / Elixir should not be used for heavy number-crunching. The overhead you speak of is basically orchestration and coordination and every system that does the same – Kubernetes included – has it. I can’t really pinpoint your gripe with the BEAM; it never promised to be a C / C++ replacement. Unless I am misunderstanding you?

No, you don’t. Even Java, to this day, struggles to give you something transparent like Task.async_stream. I’ve never seen a dynamic language that is actually able to do it (of course, I don’t know them all). In my 16.5 years of total experience Elixir is the first language that gave me a mechanism to distribute work on all CPU cores as an integrated part of the code pipeline. Ruby, PHP, Python, Javascript – they cannot get it right even today despite the continuous and rather hilarious attemtps. The best Java did was try and imitate the OTP (Akka framework) and it failed to provide half the guarantees of it. I don’t imagine C# being in a much better shape but maybe you will correct me.

C# and Java are still stuck trying to provide POSIX-like semantics (mutexes, semaphores, condition variables) and most of their multicore story PR is to hand-wave away the problem that giving programmers direct control of OS threads is never going to work – because it still hasn’t worked; deadlocks in C / C++ are the most normal thing in the world even today (in Go as well, to a lesser extent).

Strongly disagree. The fact that they protect you from a plethora of nasty synchronization bugs which many other frameworks have might be giving you the wrong impression that all other frameworks do the same. Which is sadly not true.

The biggest app I participated in still has the problem of very occasionally disconnecting from the Postgres DB server; Ecto made this totally transparent: it simply reconnects and retries the SQL command and we wouldn’t even know of the problem if we didn’t have paranoid logging. And it never really caused a problem. The fault tolerance made this problem just be of a curious value and didn’t force anyone to go into firefighting mode.

Rails handles them just fine, yeah… And is doing so 85x-100x slower than Phoenix and Absinthe. That’s not a joke; I already rewrote 2 big Rails apps to Phoenix and Absinthe and the average response times went from 310ms to 3.5ms. I watched the real-time graphs and was shaking my head for 10 good minutes back then. Also, Rails apps had caching; Elixir apps still don’t have it.

Examples? Those you gave are very generalized and I cannot see how could we discuss them without more details.

Elixir is like any other language and tech – it’s a tool and you always have to pick the right tool for the job. Its drawbacks are mostly that it isn’t suited for number crunching and things like DB indices (namely large mutable data structures that have to be modified in-place and very quickly). Outside of that, I can’t find a flaw in Elixir or the BEAM; wrote 3 commercial projects with it so far and have at least 5 personal smaller projects and it made me so much more productive than before.

Pardon the probably inaccurate observation – you do seem like a person who judges Elixir for promises that it never made. Maybe your work simply isn’t well-suited for the BEAM languages? That’s quite okay, they never claimed to be end-all, be-all. (I wouldn’t ever try writing real-time video streaming in Elixir for example; probably explains why most of Twitch’s infrastructure is in Go.)

This is a much bigger niche than you imply. I personally wrote tens of thousands of code lines in C++, Java and Ruby trying to achieve fault-tolerant server-side apps and never succeeded – many others like myself failed as well. Sadly @rvirding is right: most of us write half-done Erlang OTP variants while we work outside of the BEAM. Only took me 14 years to realize it but what can you do.

To have a productive discussion, I believe you should give concrete examples of projects where you think the BEAM languages are a poor fit.

Apologies if I misunderstood you anywhere along the way.

8 Likes

This is plain false. It’s trivial to saturate all your cores with the BEAM. The overhead is minimal compared to CPU-intensive workloads and there is no bookkeeping that will ever contend with an actual CPU-bound task. As @cmkarlsson said you have a bottleneck somewhere in your system that is causing this. Even with thousands of processes being scheduled over several cores you should be able to keep your processors pinned given an actual CPU-bound, consistent workload.

2 Likes

So you are using a mostly IO bound workload to profile the CPU? Not maxing out the CPU may actually be a good sign. Because you are being vague on the details, I am free to interpret your data like this:

  • C# had to serve less requests because it maxed out the CPU
  • Elixir was able to serve more requests and have spare CPUs

If that’s the case, I will pick the second, thank you.

That’s why when talking about benchmarks, we need numbers and methodology. There are about hundreds of things that could go wrong. Or even when the measurement is right, we take the wrong conclusions. So unless you can provide applications, benchmark tools and methodology, there is nothing to conclude and nothing to discuss.

Can you please provide an actual example? Please let us know your OTP version and OS too. In literally years benchmarking Elixir applications with tools like wrk, I did never bring it down. Even when opening 2 million connections - where we used 40 different client machines to benchmark a single server.

It is not about bad requests bringing down the server but how you semantically react to those. I have literally seen frameworks and libraries rescuing OutOfMemoryError and putting systems in an unmanageable state because of that.

Still, focusing on bad requests is a gross misrepresentation of what fault tolerance means on Elixir. It is also drastically undervalues the benefits of process in designing those systems. Some examples:

  • Ecto being fault-tolerant means safer design around connection pools (and leaking of connections)
  • Phoenix being fault-tolerant means we can easily multiplex multiple channels over the same websocket connection and save on system resources, while also scheduling on CPU and IO bound work
  • Ecto being built on top of processes grants an excellent amount of visibility into the system. You can navigate process trees, inspect the pool memory, state, queue size and more, etc
  • Phoenix being built on top of processes means no stop the world GC and per process garbage collection

And the list could go on and on.

You did not. You made vague statements. “It isn’t fast”. For what? Compared to what? You said it “doesn’t maximise the CPU” but you didn’t provide an example workload. It fails during benchmarking? How? What errors? Under which scenarios?

Yet we see companies using it for data processing with GenStage and Flow. Or for the web with Phoenix. Or for embedded devices with Nerves. I recommend folks to look at videos from conferences such as Empex, ElixirConf, CodeBEAM and others to learn more about the variety of use cases BEAM is deployed to.

24 Likes

I wrote this blog post about avoiding GenServer bottlenecks. The best architecture is generally to model the natural concurrency of your system, and Elixir makes it really easy and safe to handle concurrency. The language is also a pleasure to write in.

8 Likes

Sorry for a bit of thread resurrection, but I took the initial post and enhanced it a bit with the discussions here, quoting some peeps + some other comments and posted it as a blog post.

Thanks for your input once again everyone!

3 Likes

I am actually still finding myself nodding in agreement with the title and your general premise – including the newer blog post.

I came for OTP. I stayed for the functional programming.

Additionally, Ecto and Phoenix already make very good use of OTP. So truthfully, if you use either (or both) you already are reaping the benefits of OTP, as stated by @cmkarlsson and others here.

4 Likes

having just Phoenix and Ecto be fault tolerant buys you nothing. Any modern web framework (regardless of language) can handle bad requests without bringing down the server

Fault tolerance isn’t just “what happens if there’s an exception”, it’s also how you handle load. If you can’t accept connections or if you take 30 seconds to reply, you’re effectively down.

What follows is my theoretical understanding.

Many web frameworks require running 1 OS thread per web request. Eg, you explicitly configure how many processes and threads per process to run if you’ve got a Rails app running on Puma. If you have 16 total threads and you get 17 simultaneous web requests, one of them is waiting in line. The 16th user is (hopefully) getting a nice response, and the 17th is hung entirely. Anyone making 20 requests at a time has got you with a denial of service attack.

Phoenix (via Cowboy) runs one BEAM process per request, and we know we can run millions of those. A million simultaneous web requests will probably hit some other bottleneck, like the database, of course, but Elixir itself will just keep adding processes as needed, each one slowing down the existing ones very slightly as they share scheduler time, giving a very smooth degradation under load, all without you having to think about it.

5 Likes

@sync08: Did you ever figure out why you weren’t able to saturate your cores and why C# pinned them while doing IO?

Web frameworks generally are have been moving away from that for quite a while now. Basically since the C10K problem (a term coined in 1999) and the C10M problem became a thing. This has been sped up recently due to better OS support (epoll, IOCP) and even more so because of better language support (async/await in C# for example).

Also, the 17th request can be handled by just starting a new thread (or OS process) in even the most primitive webserver. That may not be the most resource efficient way to do it, but it certainly isn’t a denial of service. How do you think the current internet could even function of that was true?

See https://mrotaru.wordpress.com/2015/05/20/how-migratorydata-solved-the-c10m-problem-10-million-concurrent-connections-on-a-single-commodity-server/ for an example of doing 10 million concurrent connections with java from 2015. And note how the article is more about OS and network config and not at all about how to write the code, because that is a solved problem.

1 Like

He’s not wrong. His example of Puma - 16 threads - is the default for that Webserver: https://github.com/puma/puma#thread-pool

You can set it to 16 or 16,000, either way when you run out of threads the new requests are blocking.

1 Like

OS threads are a limited resource and most OS-es struggle under pressure when having to spawn thousands of them.

This is as DoS as it gets. Make a lot of requests that are costly for the OS and the hardware and they buckle under pressure. Maybe we have different definitions of DoS in mind?

Current internet is barely limping as is evident by the average load times of most websites and the fact that the best way to kill a website is to post a link to it on a popular aggregator like HackerNews or Reddit.

Apart from the few smart folks that setup a main website hosted on services like Netlify / CloudFlare, most websites are a DoS waiting to happen.

It’s true that a good chunk of a well-oiled webserver magic lies in OS configuration, everybody knows that.

But to claim that how to write the code for it is a solved problem is simply false.

I don’t have any intimate knowledge of Puma (or Ruby for that matter), but a quick search shows there are alternative servers for ruby which do asynchronous IO and/or Fibers. They might well exist because Puma is doing it wrong. A lot of older frameworks do work that way, and it is often hard to change existing code to work in a fundamentally different way.

However, most webframeworks will run multiple threads. It’s what happens inside those threads that matters. Are those threads unavailable until the request is fully served, or do the get put back into the thread pool when they are waiting for something external? That is the game changer, In the first case your CPU may be sitting there idle because all threads are ‘busy’ waiting for something from the disk of the database. The latter case allows the CPU to work on other things (e.g. new incoming request) in the mean time.
This is also basically the way the Beam VM works, Beam also starts a OS thread per core and then runs multiple Erlang processes on these threads in turn.

I’ll dig a bit deeper into (my view of) the basic principles behind this.
Firstly it’s important to note that this is only relevant when there is IO being done on which the CPU has to wait before it can finish the work. Imagine a service which just calculates the square root of an input number, nothing else. In that case the running a thread per core and just processing the incoming requests in sequence is the most efficient way of doing it. There are two possible outcomes, either you keep op with the requests, or you max the CPU and still can’t keep up. In that last case you simply lack the computing power and there’s is nothing you could have done differently which would have made it work. Worse still, anything you change will mean more overhead, reducing the number of requests you can process.

But most software isn’t like that, any webserver needs to deal with reading from and writing to the network at the least. But it’s likely doing logging to disk, reading files, talking to databases or other services, etc. Because it’s a waste of CPU cycles to do nothing while waiting for those external things you want to do multiple things concurrently, if one piece of code is waiting for something you should run something else in that time. On the conceptual level that is all, it isn’t more complicated then that. At least, when just looking at resource usage, more on that later.

On the technical level the OS solves this for you, just run a process for each piece of work you’ve got and the OS will schedule those processes on the CPU and you’re done. Conceptually there is nothing wrong with that. But a process has all sorts of stuff attached to it (users, permissions, open files, etc.), which means that each process needs quite a bit of memory just to exist and on top of that it means that switching between processes is quite a bit of work. It works fine, it’s just inefficient.
To improve on that we have threads, which are just ‘smaller’ processes which belong to the same process. It’s still the OS that is scheduling them, but there is less information attached to a thread which means they use less memory and switch faster. But it still has stuff attached to it, and we still move in and out of the kernel when switching. This also works fine, but there is still a fair bit of overhead.

Now if we want to be more efficient then threads are, there’s only one option left. We need to start scheduling the work by ourselves. And because the VM running the program has more knowledge of what is going on in the program it’s in a better position to keep track of pieces of work and do the scheduling with minimal overhead. Not zero overhead, just less overhead. This is what Beam has been doing from the start. But it is also basically what the ‘green threads’, ‘fibers’, ‘async’, ‘non-blocking’ things boil down to.

Now there is a bit more to it than just getting work done a fast a possible. When there is one chunk of work which needs an hour of CPU time to complete you likely don’t want everybody else to wait for that. So you start switching between different pieces of work in a way you deem fair, even though that creates extra overhead. How to do this is a whole science in itself with loads of trade-offs. The OS does schedule the threads in a way it deems sensible, you have some influence on that but it’s limited. Therefore, doing your own scheduling also allows you to make certain choices yourself, instead of relying on the OS. Like how you deal with high loads, whether you favor consistent response times over better average performance, how you deal with failing ‘threads’, etc. I’m guessing that for Beam those things where actually more important then the efficiency.

There are also downsides to doing scheduling yourself. Firstly it’s not easy, secondly you might need some knowledge of the hardware to get the most out of it. For example, when multiple cpu’s/core’s became a thing the Beam VM needed to be changed to support it, while those using threads got it for free as soon as the OS supported it. But taking things like hyperthreading, CPU cache size, etc into account can allow you to improve scheduling. The OS is a far better position to take those things into account, and with increasingly complex CPU’s (core-complexes, Big-Little architectures) this is more relevant then before.

While user-level scheduling is all the rage right now (for good reasons), there is already movement towards improving the OS to support more efficient threading. My guess it that this will cause a reverse movement in the future, because the effort of implementing scheduling yourself may at some point not be worth it anymore. The Beam VM might be the outlier there, as it has some design goals which differ from what mainstream software is doing.

Note that this is all about how the software eventually runs on a machine, how a programming language supports this (and hopefully makes it easy) is a whole different topic which is often muddles the discussions about it. Even more so because languages are often tied to a specific VM and thus tied in to a specific way of scheduling work. Not to mention the fact the terminology is often confusing as well.

1 Like

I’ll try to keep this reply a bit shorter :wink:

PS C:\Windows\system32> (Get-Process|Select-Object -ExpandProperty Threads).Count
3262

That’s my laptop right now doing nothing special but running thousands of threads.

Hardware is the ultimate limited resource. Therefore you can DoS anything if you manage to create more work then the hardware can handle. It’s just easier if the system you’re trying to DoS isn’t making efficient use of the hardware. Adding Cloudflare is nothing but adding more hardware…

So what I’m saying here it that starting threads is not fundamentally wrong, just not the most efficient. There’s two ways of dealing with that, don’t use lot’s of threads, or make thread more efficient. Both are valid options, neither will allow you to exceed the limits of the hardware.

I disagree, but I may have a different definition of ‘solved’.
Driving 400km/h in a car is a solved problem. The fact that most cars aren’t nearly as fast as that doesn’t change that. The knowledge required to engineer a car which can both reach and deal with those speeds exists. That makes it a solved problem.

People run servers dealing with millions of concurrent connections, it’s all researched, documented and even available in free existing tooling. So that too is a solved problem. That’s not the same a it being trivial, or cheap, or easy, or even common knowledge. And I’ll happily agree it’s not trivial just yet, and it should become so easy we don’t even think about it anymore.

I am familiar with async I/O. Its a big part of the reason I use Elixir, because I get a sequential programming model while everything is event driven below it, and this is true across the entire ecosystem. This is not true of other popular web frameworks such as Ruby on Rails (Ruby), Django (Python), Spring (Java) or Rocket (Rust) which rely on many synchronous I/O libraries. In those other languages you have to take care to use libraries compatible with an evented I/O model (kqueue/epoll), and the majority of their ecosystems are not compatible with that. So yes Event Machine, Tornado, Akka/Play, Actix all exist and you can use them but in most cases they are not the most popular choice because it is easy to scale webservers horizontally and hard to replace the entire ecosystem of libraries.

2 Likes