Decrease CPU consumption for many processes (50k+)

ImNotAVirus · October 30, 2021, 6:00pm

Hi everyone,

I am working on a project where I have to spawn many processes (50k+).
Each process represents an entity that must move on a map.

The behavior of an entity is simple:

Each entity choose a random coordinate in a radius around it.
Then it calculates the path to go there
Once arrived at destination, it pauses for a random time between 850 and 1150 ms
It starts again to choose a random coordinate
etc…

The pathfinding algorithm is written in Rust (NIF) for performance reasons

So I started to do my first tests with only 10k processes.
Unfortunately, I found that just having 10,000 processes spawn looking for a path and then waiting was already permanently using between 45-55% CPU on my Windows.
Curiously, when I did my tests on WSL, the same algorithm takes only 5 to 10% of CPU on WSL2.

This huge difference in CPU consumption is the first thing I can’t understand/explain.

I then tried to benchmark my pathfinding function using Benchee to see if that was the source of my problems.

I got theses results:

Name            ips        average  deviation         median         99th %
astar      355.56 K        2.81 μs   ±863.72%        2.20 μs       18.60 μs

According to these results, if I take the average execution time of a function, calling 10,000 times the pathfinding function should take only 28.1ms.

So normally, the CPU should not even reach 1% (except maybe when launching the application).

This is the second thing I don’t understand: why the CPU is permanently busy.

Having never worked with so many processes, I don’t know where to start in order to debug such problems.

The code used for my tests and benchmark is available here: GitHub - ImNotAVirus/elixir_nif_example

This code has been simplified to include only the spawn of the 10k processes, the call to the pathfinding function and the pause of workers.

Thanks in advance

kip · October 30, 2021, 10:46pm

The BEAM implements busy wait to deliver a smoother and more predictable responsiveness. There are tuning parameters +sbwt, +sbwtdcpu and +sbwtdio that can change the default behaviour. This gist shows some example usage.

In general, a high reported CPU utilisation does not necessarily mean your system is under stress. Typically you don’t need to apply the tuning parameters. And all other things being equal, 50k processes isn’t at all unreasonable.

mpope · October 30, 2021, 11:47pm

Instead of long running GenServers, have you considered using short lived Tasks? You can use one GenServer to launch these tasks, or can partition the entities across several if one scheduling GenServer becomes a bottleneck. You’d have to store the state in an ETS table indexed by an ID.

NduatiK · October 31, 2021, 6:11am

Whoa! This is amazing. I am working on something similar for a Stochastic Processes class. Simulating vehicles for traffic light optimization.

I was planning to use libgraph, but this NIF based approach is getting me thinking about performance at scale. Will things work when I jump to more processes? With NIFs, probably.

Thanks and good luck!

PS. Liveview + Surface + SVGs are amazing! Probably would have gone with elm without it, but so far so good.

ImNotAVirus · October 31, 2021, 7:43am

Thanks for your answers everyone.

I did not know the busy wait of BEAM.
So I tested it by deactivating it and I get slightly better results.

Here is what I got (on Windows):

10,000 workers: between 30 and 35%
50,000 workers: between 45 and 60%
100,000 workers: between 70 and 95% (which causes my CPU to be at 100% sometimes because not only BEAM is running)

Do you think it would be possible to improve this?
I don’t know if it’s possible but having 50k proccess under 20% CPU and 100k under 50% would be perfect (or at least be able to run 100k processes without reaching 100%).

ImNotAVirus · October 31, 2021, 7:45am

That was one of the first things I thought of. But the problem is that the entities will have to be able to interact with each other later on (for example if one of them enters the field of action of another one).

That’s why I preferred to make one GenServer per entity.