Why is this system (running only a thousand simple GenServers) slow?

Given that CPU usage is 100% and you don’t observe I/O load, then it’s possible that your work is spent in each PlayerServer.

You could verify this by experimentally finding a smaller number of player servers which puts your CPU below 100% (say at 90% or so). Then you would have enough CPU available to work with tools such as observer. If in the processes tab you constantly see your player servers at the top, it should be a proof that these are the processes consuming your CPU.

Going further, you could use eprof to get some pointers about where you spend most of your time. This SO answer by Fred give some quickstart pointers.

If you’re able to find a sequential piece of code which causes your problem, you can drill into it further with the fprof Task.

IME reading the output of these profilers will usually require some meditation, so don’t be surprised if you’re not immediately able to find the cause. But most often you should be able to get to the root cause of your bottlenecks, or at least narrow down the problematic area.

Combining these techniques with some cheap trickery, such as commenting or stubbing out suspicious pieces of the code should help you find the problematic parts of your code.

4 Likes