Why is elixir GenServer / IPC peformance so different on some of my platforms?

arcadian · August 16, 2022, 8:57pm

I am seeing huge variations in GenServer performance in some of my platforms and I think it is a result of messaging performance on different architectures. Has anyone seen this before or have any ideas as to what is happening or how I would investigate? I want to make sure that I put a fast system into production rather than a slow one.

Specifically, I wrote a short program that spawns an incrementing GenServer (just keeps a count of how many times it is called), and ten Agents that call that GenServer a million times each. On my i5-9500, the program takes 9.5 seconds, and on my i7-9800X, it takes 48 seconds - about 5x longer. Both systems are running on the same OS (FreeBSD) and the same version of elixir and erlang (built using kerl / kiex).

Some thoughts: The i7-9800X is technically a Skylake-X system, which may have the PAUSE issue, and it also has hyperthreading - maybe those could be part of the issue. Or perhaps there is also a way to change the latency of Erlang / Elixir to respond to messages more quickly. Any ideas / suggestions would be welcome.

Exadra37 · January 31, 2025, 9:14am

I am really curious why this did happen to you, but I don’t have a clue how to help you.

Maybe one of the possible reasons you didn’t get any response on your question, it’s because you didn’t share the code

hubertlepicki · January 31, 2025, 9:46am

So let me get this right: you have ONE GenServer process that receives millions messages from 10 other processes?

I know nothing about these processes, but in that scenario what would matter a lot is single-threading performance of the CPU, as the nature of work is I assume concentrated in the GenServer processing messages, and also amount of cache in the CPU.

I don’t think this is a real life scenario, by the way, you usually would not have one bottleneck like that by design precisely because it’s a bottleneck.