Elixir vs Go Performance (Anton Putra)

Exadra37 · January 9, 2025, 12:16pm

Wow, this topic is full of good insights. I have some good links and food for though to create some LinkedIn posts (not any-more active on Twitter) :).

For anyone wanting to watch the part 1 and part 2 videos while taking linked notes and creating bookmarks I invite you to watch them at:

Part 1 video: Elixir vs Go (Golang) Performance (Latency - Throughput - Saturation - Availability) · Video Pundits
Part 2 video: Elixir vs Go (Golang) Performance Benchmark (Round 2) · Video Pundits

dimitarvp · January 9, 2025, 12:24pm

What would those be? Some example titles?

Exadra37 · January 9, 2025, 12:34pm

For example:
Elixir vs Go performance: The pros, the cons, the trade-offs
Elixir vs Go: When resilience and fault-tolerance is more important then pure performance

Exadra37 · January 9, 2025, 1:52pm

I was surprised with this failure rate too, because I was with the impression that the strongest point of the BEAM was availability under extreme loads, with expected performance degradation off-course.

It would be interesting to ran the benchmark until one of the langs crash and stops to be available entirely. Off-course Kubernetes will restart it, but I would like to know the breaking point of each. I suspect that the BEAM wouldn’t crash, therefore not requiring K8s to restart it, but I would expect Go to crash at some point.

Does the memory spikes of Go in the benchmark mean that it crashed and required a Kubernetes restart?

dimitarvp · January 9, 2025, 2:22pm

I suppose this is why I don’t blog – these things look mega-obvious to me.

Filter bubble at work.

andyleclair · January 9, 2025, 7:05pm

From what I could tell in the video, what this appears to be demonstrating is that Kubernetes is throttling Elixir. I would prefer to see something like this run on an actual server. Otherwise, what this looks like to me, is that Kubernetes is misconfigured. The drop in performance on the Elixir side strongly correlates with Kube throttling. I could be wrong here, but I regularly see Elixir services happily chugging along at 100% CPU for quite some time with no issue or degradation in performance.

Exadra37 · January 9, 2025, 7:42pm

I don’t have experience in kubernetes. Can you elaborate a little more on why it throttles Elixir and doesn’t throttle Go?

D4no0 · January 9, 2025, 7:47pm

I think the overall impression points to a glanced-over configuration that favors fine-tuning for golang and not for other ecosystems reviewed, that is my personal impression from the specific video and other videos posted by that youtuber.

andyleclair · January 9, 2025, 9:17pm

I don’t use Kube, so I’m sorta extrapolating a bit (and I read some PRs into the creator’s repo), but if you watch the video you can see how CPU throttling on Elixir spikes after it hits 100% CPU

This correlates with big spikes in latency on Elixir. We don’t see that happening on Go, since it doesn’t hit 100% CPU.

AFAIK, this is a tuning in Kube issue dealing with CPU quotas. Kube is penalizing Elixir for being at 100% CPU, effectively.

Exadra37 · January 9, 2025, 10:22pm

f that is indeed the case, which makes sense to me, then some kubernetes expert needs to open a PR in the repo with the fix and the author needs to create part 3 of this benchmark

Maybe @shanesveller can help here?

outlog · January 10, 2025, 10:53am

Only p90 is being tested/shown - way to low.
A custom simple thrown together client is being used - most likely doing CO(Coordinated Omission) - eg. results are incorrect/misleading
One system is mostly tested in overload >100% cpu (due to 1&2 nothing is really measured before overload occurs eg. latency 0?! ), the test is quickly stopped once the Go test reaches overload.
The tests are out of sync, one test gets ahead of the other, for a given time.

Remedy:

Use p99, or better measure the distribution.
Use a proper client like Gatling (it even has an easy to use/install javascript tool these days: Create your first JavaScript-based simulation)
Keep going into overload, for both systems.
Solved by 2 - using a proper client.

So it’s all about using a proper load testing tool/client.