Some Elixir/Erlang code doesn't use all logical CPU's on hyperthreaded machines

I recently tried the 1 billion row challenge in Elixir. I have a Intel i9 MacBook Pro with 8 hyperthreaded CPUs, so 8 physical cores, 16 logical cores.

My first pass used streams to read in the large file and sent chunks of the file out to GenServers (technically Agents using the Agent.cast function) for processing. When I used File.Stream and Stream.transform, only 8 of my 16 logical cores were being used:

But when I used lower-level prim_file functions to read in the file all 16 cores were being used:


It turns out any streaming function causes this problem. I used Stream.unfold to manufacture lines of data, no file access involved at all, and that also gave the 8-core behavior.

I got the same behavior on a Windows 11 machine with an i7-12700H, running Linux under WSL, 14 physical cores up to 20 with hyperthreading.

Later I tried compressing my large buffers to see if copying less data across process borders would help, so I used the Erlang zlib module, :zlib.gzip and :zlib.gunzip. When I used these functions I also got the 8-core behavior.

So, is this some kind of problem with Elixir or the BEAM, where certain code doesn’t use all logical cores on a machine, or is this expected behavior? If it’s expected, what is the criteria for executing on all logical cores vs. just the physical cores?

From the first image, it looks like the first 8 cores aren’t saturated, so perhaps it’s not moving to the logical cores.

It could be the case that with the different file loading mechanism that the 8 cores are actually saturating the disk iops. Can you try measuring the cpu and disk at the same time? The other possibility is that there is some internal bottleneck in the Stream usage, however, your zlib experiment also seems to indicate that the disk was saturated.

But I also had the 8-core behavior when using Stream.unfold to generate data, no disk involved.

The Stream functions don’t have any role to play in initiating concurrent behaviour themselves so something else at a lower level (either you or the runtime) has that responsibility.

One thing you might check is the output of System.schedulers_online() since some functions (like GenStage) use that value as their default concurrency. But you didn’t mention any of those functions so thats unlikely the issue.

When you say sent chunks of the file out to GenServers - how are these agents being initiated and how are you deciding how many of them to start?

1 Like

You can also try raw files from the Elixir code?

2 Likes

The “long answer” is to have a look at my code here, where brc.ex uses prim_file and brc_stream.ex uses FIle.stream.

Short answer, I’m setting up a pool of “:erlang.system_info(:logical_processors)” Agents that I initialize using Agent.start_link with an empty Map, then round-robin through the Agents and send buffers (or lists of lines, in the case of brc_stream) using Agent.cast to process the data.

The use of Agents this way is the same between the prim_file and File.stream approaches to read the file.