You may not need GenServers and supervision trees

Yeah but I get that with many languages now. I’m still half in the C# world and despite the hype about Erlang/Elixir multi-core usage, it’s actually not that great at it. I’ve done some benchmarks and saturating all cores caps out at about 75% CPU usage. In C# I can get almost 100% usage. Even if I could push Elixir to utilize all 100% (you can’t, it’s fundamental BEAM overhead) you still won’t see the performance come anywhere close. I’m talking orders of magnitude difference in most tasks.

What I will say is that concurrency in Elixir is way easier and safer.

Is it possible for you to show an example of code that C# parallelizes so much better?

2 Likes

If we’re sticking with the topic of this thread then the most basic and relevant demo would be to compare a newly created ASP.NET Core web app and Phoenix app.

I used wrk/ab to saturate the connection and then profiled the CPU cores. You’ll get pretty much what I mentioned above. What’s interesting is that I took down the BEAM way more times doing these benchmarks.

I think I covered most of the downsides in my post above and I agree that those could apply to some other languages too. I guess what I’m trying to get across is that Elixir is, for better or worse, created on the BEAM. This pretty much limits it for one specific use case. No matter how great the language may be, it just doesn’t make sense to use it for the language alone.

I’m not trying to be funny, but, “most of the downsides” seem to be a single one - and although you might be completely correct about it, I don’t see any proof of what you’re claiming. A test case that would be replicable would be a great start.

(and I personally disagree that the language alone can’t be a strong enough reason to use it - I went and looked at how you would do a controller and a webpage in ASP.net and I got syphilis out of it - but that’s my personal opinion)

2 Likes

I think the most important thing to understand and to use properly is the concurrency in the problem/solution/system. Using GenServers and other behaviours is just one way of doing the concurrency but it is not the only way. They are tools and like all tools they need to be used in the right way. The problem is to get the right level of concurrency which suites your problem and your solution to that problem. Too much concurrency means you will be doing excess work to no real gain, and too little concurrency means you will be making your system too sequential.

Now, as has been pointed out, many packages like Phoenix already provide a pretty decent level of concurrency which is suitable for many types of applications, at least the ones they were intended for. They will do this automatically so you don’t have to think about it in most cases, but it is still there. Understanding that is necessary so you can work out how much concurrency you need to explicitly add if any. Unfortunately because it is all managed for you “invisibly underneath” many don’t realise that is it there.

19 Likes

Because this is an elixir forum I’ll come to the rescue of the BEAM and counter some of your arguments :smiley:

If you can’t saturate nearly 100% there is something wrong with the system somewhere. I.e you have a GenServer bottleneck, IO bottlebeck, NIF/BIF bottleneck somewhere. The BEAM overhead is not that much

The order of magnitude can be correct, but then we should be talking CPU expensive tasks which have not been correctly off-loaded to a port/NIF or some micro-benchmarking. From my experience working in go, java and erlang I get pretty comparable numbers on real world applications.

Yes, erlang is slightly slower than the other two, but we are talking 10-20% (sometimes up to 50%) here but not order of magnitudes. And I’ve had bottlenecks in the other languages too making them not being able to utilize 100% CPU something especially go should be good at.

If you are stress-testing and overloading the SUT this is my experience too in the first iterations. When stress-testing there is always some component that can’t handle it and in the BEAM this may lead to rapid restarts of the supervision trees and crash of the runtime. Java seems to stay up longer but in practice is not doing much useful work at those loads. For the BEAM you can usually find these places and put up guards around it to make sure the traffic is dropped (for example) before reaching those parts. Any system or runtime will have these problems when overloaded for periods of time.

On the other hand, in practice, if you put nearly 95% load on the system, what I see is that the BEAM gives you much more consistent latency, especially compared to java.

I agree if you look at the basic web system, the BEAMs fault tolerance doesn’t give you much advantage. This is because HTTP is stateless, whereas BEAM is designed for stateful applications.

However a system is more than that. Database servers, message queues, notification servers, statistics collecting, communication with other external systems. anything that requires some sort of state and the BEAM is so much easier to work with, and if one of those parts crashes it doesn’t affect anything else in the system. Especially now when web-sockets and stateful connections are becoming more prevalent BEAM languages has a big advantage. It makes it much easier to isolate and write robust components in erlang/elixir (which perhaps is your point)

For the thread in general. I came to erlang from java and python and I also could not initially see the advantages or how to work with the BEAM to make the most out of it. I used processes and gen_servers and similar just for the sake of it usually with bad and results and awkward code. I think my problem was that I looked at things the wrong way. I had this amazing tool in the BEAM and I was trying to apply it everywhere. Therefore I think the original poster is correct. You may not need GenServers and supervision trees and you should not try to force the BEAM tooling onto a problem just for the sake of it.

Instead you should get as much information, read as much material, and practice to write systems in OTP as much as possible. Then you will see where it is needed and how it can be applied. I’ve also noticed in the elixir community a much larger willingness to use external libraries than in for example erlang (perhaps because there aren’t many libraries there :wink: ). These external libraries make use of OTP in the best way and all you need to do is glue these components together. You get all the benefits of BEAM without doing things yourself. The risk is, if you don’t understand the tooling you don’t know what trade-offs you are making, you don’t know if a 3rd party library is well designed and many times the 3rd party library is not needed at all. We learn all the time and as you progress it will be easier to see these things.

21 Likes

Do you use both mnesia and ets primarily for caching? Or something else?

To be honest I started with ets for all caching but am still in the process of converting everything over to mnesia so that I can run it a distributed way. But yes I use them always to store and access data that I can’t reach to a db for due to speed constants(like token authorization)

I pretty much agree with what you wrote. That said, I think that GenServer/supervision trees are the pieces people should learn about, because in my experience they are great solutions in many cases, and I’ve yet to see a production which didn’t need a GenServer nor some form of supervision tree fairly early on in the game.

With a lot of hand waving, I’d say that GenServers are OTPs built-in building block for building responsive services, Tasks are the same for non-responsive ones, and supervision tree is the built-in service manager like systemd or upstart. In the past 10+ years of my backend side experience, I’ve worked on small to medium systems, and all of them needed all of these technical approaches.

So I guess my point is that while OTP abstractions can be misused (and they frequently are), they are also very useful, and in my experience very frequently needed. I’ve tried to provide some examples of both functional and concurrent design in my To spawn or not to spawn? article. In particular, in that fairly simple example I already use a couple of GenServers and Supervisors to separate the runtime activities, and I don’t think it’s overengineered.

But I ultimately agree with you that with the ecosystem evolving, there’s less need to write GenServers ourselves, since many common cases can be covered by 3rd party libraries, such as Phoenix, Ecto and others.

21 Likes

This makes me wonder who promised you a 100% saturated CPU with the BEAM? It’s a very common knowledge that Erlang / Elixir should not be used for heavy number-crunching. The overhead you speak of is basically orchestration and coordination and every system that does the same – Kubernetes included – has it. I can’t really pinpoint your gripe with the BEAM; it never promised to be a C / C++ replacement. Unless I am misunderstanding you?

No, you don’t. Even Java, to this day, struggles to give you something transparent like Task.async_stream. I’ve never seen a dynamic language that is actually able to do it (of course, I don’t know them all). In my 16.5 years of total experience Elixir is the first language that gave me a mechanism to distribute work on all CPU cores as an integrated part of the code pipeline. Ruby, PHP, Python, Javascript – they cannot get it right even today despite the continuous and rather hilarious attemtps. The best Java did was try and imitate the OTP (Akka framework) and it failed to provide half the guarantees of it. I don’t imagine C# being in a much better shape but maybe you will correct me.

C# and Java are still stuck trying to provide POSIX-like semantics (mutexes, semaphores, condition variables) and most of their multicore story PR is to hand-wave away the problem that giving programmers direct control of OS threads is never going to work – because it still hasn’t worked; deadlocks in C / C++ are the most normal thing in the world even today (in Go as well, to a lesser extent).

Strongly disagree. The fact that they protect you from a plethora of nasty synchronization bugs which many other frameworks have might be giving you the wrong impression that all other frameworks do the same. Which is sadly not true.

The biggest app I participated in still has the problem of very occasionally disconnecting from the Postgres DB server; Ecto made this totally transparent: it simply reconnects and retries the SQL command and we wouldn’t even know of the problem if we didn’t have paranoid logging. And it never really caused a problem. The fault tolerance made this problem just be of a curious value and didn’t force anyone to go into firefighting mode.

Rails handles them just fine, yeah… And is doing so 85x-100x slower than Phoenix and Absinthe. That’s not a joke; I already rewrote 2 big Rails apps to Phoenix and Absinthe and the average response times went from 310ms to 3.5ms. I watched the real-time graphs and was shaking my head for 10 good minutes back then. Also, Rails apps had caching; Elixir apps still don’t have it.

Examples? Those you gave are very generalized and I cannot see how could we discuss them without more details.

Elixir is like any other language and tech – it’s a tool and you always have to pick the right tool for the job. Its drawbacks are mostly that it isn’t suited for number crunching and things like DB indices (namely large mutable data structures that have to be modified in-place and very quickly). Outside of that, I can’t find a flaw in Elixir or the BEAM; wrote 3 commercial projects with it so far and have at least 5 personal smaller projects and it made me so much more productive than before.

Pardon the probably inaccurate observation – you do seem like a person who judges Elixir for promises that it never made. Maybe your work simply isn’t well-suited for the BEAM languages? That’s quite okay, they never claimed to be end-all, be-all. (I wouldn’t ever try writing real-time video streaming in Elixir for example; probably explains why most of Twitch’s infrastructure is in Go.)

This is a much bigger niche than you imply. I personally wrote tens of thousands of code lines in C++, Java and Ruby trying to achieve fault-tolerant server-side apps and never succeeded – many others like myself failed as well. Sadly @rvirding is right: most of us write half-done Erlang OTP variants while we work outside of the BEAM. Only took me 14 years to realize it but what can you do.

To have a productive discussion, I believe you should give concrete examples of projects where you think the BEAM languages are a poor fit.

Apologies if I misunderstood you anywhere along the way.

8 Likes

This is plain false. It’s trivial to saturate all your cores with the BEAM. The overhead is minimal compared to CPU-intensive workloads and there is no bookkeeping that will ever contend with an actual CPU-bound task. As @cmkarlsson said you have a bottleneck somewhere in your system that is causing this. Even with thousands of processes being scheduled over several cores you should be able to keep your processors pinned given an actual CPU-bound, consistent workload.

2 Likes

So you are using a mostly IO bound workload to profile the CPU? Not maxing out the CPU may actually be a good sign. Because you are being vague on the details, I am free to interpret your data like this:

  • C# had to serve less requests because it maxed out the CPU
  • Elixir was able to serve more requests and have spare CPUs

If that’s the case, I will pick the second, thank you.

That’s why when talking about benchmarks, we need numbers and methodology. There are about hundreds of things that could go wrong. Or even when the measurement is right, we take the wrong conclusions. So unless you can provide applications, benchmark tools and methodology, there is nothing to conclude and nothing to discuss.

Can you please provide an actual example? Please let us know your OTP version and OS too. In literally years benchmarking Elixir applications with tools like wrk, I did never bring it down. Even when opening 2 million connections - where we used 40 different client machines to benchmark a single server.

It is not about bad requests bringing down the server but how you semantically react to those. I have literally seen frameworks and libraries rescuing OutOfMemoryError and putting systems in an unmanageable state because of that.

Still, focusing on bad requests is a gross misrepresentation of what fault tolerance means on Elixir. It is also drastically undervalues the benefits of process in designing those systems. Some examples:

  • Ecto being fault-tolerant means safer design around connection pools (and leaking of connections)
  • Phoenix being fault-tolerant means we can easily multiplex multiple channels over the same websocket connection and save on system resources, while also scheduling on CPU and IO bound work
  • Ecto being built on top of processes grants an excellent amount of visibility into the system. You can navigate process trees, inspect the pool memory, state, queue size and more, etc
  • Phoenix being built on top of processes means no stop the world GC and per process garbage collection

And the list could go on and on.

You did not. You made vague statements. “It isn’t fast”. For what? Compared to what? You said it “doesn’t maximise the CPU” but you didn’t provide an example workload. It fails during benchmarking? How? What errors? Under which scenarios?

Yet we see companies using it for data processing with GenStage and Flow. Or for the web with Phoenix. Or for embedded devices with Nerves. I recommend folks to look at videos from conferences such as Empex, ElixirConf, CodeBEAM and others to learn more about the variety of use cases BEAM is deployed to.

24 Likes

I wrote this blog post about avoiding GenServer bottlenecks. The best architecture is generally to model the natural concurrency of your system, and Elixir makes it really easy and safe to handle concurrency. The language is also a pleasure to write in.

8 Likes

Sorry for a bit of thread resurrection, but I took the initial post and enhanced it a bit with the discussions here, quoting some peeps + some other comments and posted it as a blog post.

Thanks for your input once again everyone!

3 Likes

I am actually still finding myself nodding in agreement with the title and your general premise – including the newer blog post.

I came for OTP. I stayed for the functional programming.

Additionally, Ecto and Phoenix already make very good use of OTP. So truthfully, if you use either (or both) you already are reaping the benefits of OTP, as stated by @cmkarlsson and others here.

4 Likes

having just Phoenix and Ecto be fault tolerant buys you nothing. Any modern web framework (regardless of language) can handle bad requests without bringing down the server

Fault tolerance isn’t just “what happens if there’s an exception”, it’s also how you handle load. If you can’t accept connections or if you take 30 seconds to reply, you’re effectively down.

What follows is my theoretical understanding.

Many web frameworks require running 1 OS thread per web request. Eg, you explicitly configure how many processes and threads per process to run if you’ve got a Rails app running on Puma. If you have 16 total threads and you get 17 simultaneous web requests, one of them is waiting in line. The 16th user is (hopefully) getting a nice response, and the 17th is hung entirely. Anyone making 20 requests at a time has got you with a denial of service attack.

Phoenix (via Cowboy) runs one BEAM process per request, and we know we can run millions of those. A million simultaneous web requests will probably hit some other bottleneck, like the database, of course, but Elixir itself will just keep adding processes as needed, each one slowing down the existing ones very slightly as they share scheduler time, giving a very smooth degradation under load, all without you having to think about it.

5 Likes

@sync08: Did you ever figure out why you weren’t able to saturate your cores and why C# pinned them while doing IO?

Web frameworks generally are have been moving away from that for quite a while now. Basically since the C10K problem (a term coined in 1999) and the C10M problem became a thing. This has been sped up recently due to better OS support (epoll, IOCP) and even more so because of better language support (async/await in C# for example).

Also, the 17th request can be handled by just starting a new thread (or OS process) in even the most primitive webserver. That may not be the most resource efficient way to do it, but it certainly isn’t a denial of service. How do you think the current internet could even function of that was true?

See https://mrotaru.wordpress.com/2015/05/20/how-migratorydata-solved-the-c10m-problem-10-million-concurrent-connections-on-a-single-commodity-server/ for an example of doing 10 million concurrent connections with java from 2015. And note how the article is more about OS and network config and not at all about how to write the code, because that is a solved problem.

1 Like

He’s not wrong. His example of Puma - 16 threads - is the default for that Webserver: https://github.com/puma/puma#thread-pool

You can set it to 16 or 16,000, either way when you run out of threads the new requests are blocking.

1 Like