Anyone know why Elixir's CPU usage is so high in this benchmark?

benchmark

#1

This is an interesting article to read. Elixir’s performance, like usual, is excellent. However, it seems like the high CPU usage is counter to what I would have expected.

Link to article: [https://stressgrid.com/blog/benchmarking_go_vs_node_vs_elixir/](http://Benchmarking Go vs Node vs Elixir)


#2

Elixir (well, the BEAM) does a LOT more than just opening, reading, and sending on sockets unlike Go, it also handles reliability between processes, supervision trees, etc… etc… Where a single bit of bad code (which is dreadfully easy in Go due to its weak typing) will bring down the entire Go server, where that just won’t happen on the BEAM. In addition the BEAM is also managing distribution data, immutable message copying (to help with that reliability) between the actors, etc… etc…

In short, the BEAM is a lot more heavyweight yes, but it scales very well regardless and it has reliability characteristics that node and go could (at this point) only hope to dream of.

Overall, if I am going for absolute pure performance then I’d use Rust, its performance, like well crafted go and well crafted C/C++ is already as fast as you can get, except with Rust you also get many guarantees that you don’t get with go, C/C++, or even Elixir, however you lose a lot of the distribution capabilities that are just inherent to the BEAM itself.

If I’m going for something that I don’t want to go down, ever, the BEAM is unmatched with its combination of distribution abilities, supervision trees (OTP in general), and ability to hot-code reload a running a system.

In general: Node is both slower and it’s unsafe. Go is faster but it is still unsafe. Elixir is more heavyweight but still fast and safe both, while being able to scale in ways unmatched elsewhere. Rust is both crazy fast and still very safe, not necessarily like Elixir’s safety but in a different way, just not as easy to distribute as Elixir.

It’s what I expect in other words. :slight_smile:


#3

Overmind, I realize that the BEAM is doing all these things for us, but for simple web requests this is too much overhead. I’ve seen other benchmarks where the cpu is barely touched.


#4

Benchmark author here. As I mentioned in the blog, BEAM was very responsive while saturating the CPU. After talking to people the theory is that significant fraction of CPU was used by busy waiting, in other words BEAM would “burn” some CPU cycles before resorting to kernel synchronization to achieve better performance. There is VM setting that controls that: +sbwt. Next time I will be running this benchmark with Elixir I will specifically look at microstate accounting to confirm this theory.


#5

kt315, is it possible that the 100ms of waiting was causing this issue?


#6

It would certainly add more need for synchronization, since BEAM process would be waiting for the timeout on receive (this is what Process.sleep does). But this would also happen if that process would call the database (likely with GenServer.call to connection pool). All in all–if these are busy waits–higher CPU usage does not mean that BEAM was overloaded in any way. Be it under higher load it would spend more CPU doing useful work and less busy waiting.


#7

I was thinking that this wouldn’t be the case for DB calls though as that is an async i/o call, right? I mean specifically for your benchmark results, the 100ms wait is resulting in a false cpu usage indicator. In the real world the db call wait wouldn’t affect the cpu at all. Or am I way off base?


#8

My understanding is that both DB call and sleep will take current process off the BEAM scheduler thread in the same way.


#9

Hm. I’m definitely not an expert on the vm, so hopefully someone with more knowledge will chime in. I don’t know why, but I assumed that the vm started like 10 additional threads for async i/o and that any networking call would fall under that.


#10

I think I’d clarify that Go is substantially less unsafe than Node…


#11

My rough estimate is that in CPU bound tasks Go might be an order of magnitude faster than Erlang, so a 4x higher CPU usage of Erlang doesn’t seem so surprising. But granted, this can possibly be improved with some tweaks. The max_keepalive value of 1000 seems somewhat low. @kt315 have you tried increasing it to a much larger value (e.g. 999_999_999)?


#12

my first thought as well (btw thanks to your blog post) - but the test is described as doing 100 requests within the keepalive - so 1_000 should be good…

The client device opens a connection and sends 100 requests with 900±10% milliseconds in between each one.

I’m interested in what is taking the extra cpu time - I assume Process.sleep is same as :timer.sleep - and is :rand.uniform expensive? or? sure you can move the req = … down into the :ok return - but shouldn’t matter much if at all… is elixir compiled against otp20 etc.

btw @kt315 the +K vm.arg is removed since OTP 21… see https://blog.erlang.org/IO-Polling/

that blog post does have some interesting info on +IOt

Configure +IOt

+IOt controls the number of threads that are used to deliver events. The default is 1 and it should be enough for most applications. However on very busy systems with many concurrent connection it could be beneficial to increase this. One way to get an indication of whether your system could benefit from it is by using msacc.

so could be interesting to tune that one and/or check msacc while the system is under stress…

btw great work on the benchmark - seems legit. (of course any benchmark is always synthetic)


#13

Oh, I missed that part. Yeah, the provided value should be good then.


#14

I once had a scenario like this, where BEAM was taking up all of the CPU, but it was totally responsive, but I had another process (nginx) running on the same machine, that was not responsive, I guess because BEAM already blocked the CPU. I’m interested in digging deeper.

In my case I figured out what was causing the problem it was the Redix lib and some code smell in my implementation. I ended up removing Redix dependency and re-implementing my server side rendering engine (it was having all kinds of issues with dead locks) and CPU usage is now extremely low, and stays low. I think this is a different issue than represented in the benchmark.


#15

It would be really interesting to see how the Go server behaves when it saturates the CPU.


#16

Unsure what redix is but server side rendering in phoenix should not even remotely block like that!! o.O
Did you find the root cause in the phoenix server template renderer?!


#17

Yes, indeed! The plan is to do a benchmark that would hit the breaking point for each runtime.


#18

Sounds like a great test! Will you implement load shedding in the test framework to maintain responsiveness?


#19

Load shedding can be tricky. The test framework is acting on delayed and aggregated metrics (for its own performance reasons) therefore it wouldn’t be very responsive.

Alternative approach is to have high resolution metrics during the rampup phase of the test. From these we can see when system started showing signs of stress (increased latency etc) and this way deduce its maximum capacity.


#20

Hey yeah I did I wasn’t using Phoenix and it wasn’t regular template rendering I was using my own gen server to render a page with chrome remote interface and basically ran into that issue

Yes I did resolve the problem, by re designing my implementation. I figured a lot of things out.