How to affect number of cores elixir compilation utilizes?

Hi.

We have a mid-sized project of roughly 75K lines of elixir code, spread over 500+ files.

When we compile this project on a 16 core machine, we can see in Grafana that only 4 cores are utilized at most. We would like to push this to at least 8 if possible.

The fascinating thing is that it caps out at 4 cores with no peaks beyond, making me wonder if there is some limit involved.

What controls how many cores are utilized and is there a cap, maximum, or default? I’ve been looking through compiler and mix documentation and have found nothing so far.

Thank you!

Elixir will use as many cores as there are scheduler threads available which defaults to the number of cores in your system.

Elixir will try to parallelize the compilation of individual files as much as possible, but any file’s compile time dependencies will have to be compiled before it which can limit the number of files that can be compiled in parallel. If you have many such dependencies in your project the compiler will not be able to use all cores.

There has been optimizations in recent Elixir releases that reduces the number of compile time dependencies so make sure you are running the latest version. @wojtekmach has also written about how to reduce the dependencies in your projects: https://dashbit.co/blog/speeding-up-re-compilation-of-elixir-projects and https://dashbit.co/blog/rewriting-imports-to-aliases-with-compilation-tracers.

1 Like

Hi, @ericmj.
The machine is listed as having 16 virtual cores, being a cloud instance. Yet elixir only utilizes only 4 at any time.
I have hundreds of files, I doubt that there is no combination of 5 that can’t be compiled independently.

elixir release was 1.10.4. We can’t go forward beyond that right now for reasons having to do with the organization maintaining the infrastructure.

What system call or variables does elixir evaluate to determine how many cores to use for compilation? I could query those to learn more about why this happens.

If you run iex you will get an output such as:

Erlang/OTP 23 [erts-11.0.3] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]

Interactive Elixir (1.12.0-dev) - press Ctrl+C to exit (type h() ENTER for help)

The number in [smp:12:12] will say how many scheduler threads are being used.

1 Like

$ /opt/elixir/x86_64/1.10.4/bin/iex
Erlang/OTP 22 [erts-10.7.2.1] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1]

Seems to be recognized as having 16 cores alright.

In that case the compiler will use 16 cores unless restricted by compile time dependencies reducing concurrency.

So, in principle I could add a dozen or so .ex files with no external dependencies, all in the same folder. Would I expect the core utilization to go up, or is it less deterministic than that?

If it helps the experiment, I could generate some code to make sure compilation takes a while.

Another good test would be to try compiling a different project

Also a good idea! Any good candidates that parallelize well?

If none come to mind, I would just generate modules with a thousand functions heads each, like def add1(1), do: 2 for a long range of scalars.

Yes, core utilization should go up if you have enough files and they are large enough, at least for part of the compilation.

totally understandable - but do give it a go on latest stable elixir and perhaps even master, to see/check if there are any gains there, potentially the entire solution is there… (and yes deploying the “fix” will be a future thing then…)

Yes, maybe I can.

While I don’t have control over the CI machine, maybe I could change the job script so that pulls in the elixir tarball, builds it, and then let it use that one instead of the one we have deployed. Maybe. :wink:

1 Like

Well, we eventually upgraded to 1.14.4 (OTP25), and I remember expecting it to improve because dependency analysis is said to be improved, but the load average stays at 4 for 16 virtual cores.

Ironically, I found this thread when trying to analyze the same problem four years later. :stuck_out_tongue: I just didn’t check back then when we upgraded, but I see load averages remain where they were, I guess.

2 Likes

Just to give a bit of an impression about the code base:

  • 700+ files
  • 450K lines of elixir code
  • Biggest file, 302K lines (it’s generated) - about 380K in total in generated files.
  • Most files range between 15-150 lines, though (of the remaining 70K lines)

So I guess I’ll look for a way to split the really large files, even though they would only occupy one core each, and see what happens. :man_shrugging:

Had some fun with the compiler.

I had a map of maps - it was (pretty-printed) 100K lines long. So I wanted to “optimize it.”

Instead of this:

%{ "A_KEY" => %{ group1_key1: "blurb",
                               ....},
   "ANOTHER_KEY" => ... 

I decided to do this:

%{ "A_KEY" => AnotherModuleInAnotherFile.A_KEY.group,
   "ANOTHER_KEY" => AnotherModuleInAnotherFile.ANOTHER_KEY.group
   ,,,}

# and in a separate file
defmodule AnotherModuleInAnotherFile.A_KEY do

def group do
  %{ group1_key1: "blurb",
                               ....}
end

end

The map had basically 155 sub-maps which I put into separate files.

My expectation was that this would be a way to split this huge map sensibly. The sub-maps were completey constant so I expected the implementations of the group/1 functions to simply return a pointer or reference to term storage and this to become super-cheap in terms of compilation. (And to be done in parallel.)

Instead I learned of a new compiler-error:

image

The supposedly complex instruction was described as: {call_ext,0,{extfunc,'Elixir.MyModule.MY_KEY',group,0}}

Also, compilation expanded from roughly 90s (for a total of 300K lines) into more than ten minutes when it got killed by a supervision process.

So much for my assumptions of how to make code compile in parallel. :laughing:

So I kept the huge maps, but split the file in three parts, and that’s what I can sensibly do here, I guess.

Alternatively I could instead put all the 155 entries into :persistent_term instead with 155 generated put/2 calls and use that. The map literal is just a module attribute used by a function for lookup. So, it could also simply refer to :persistent_term.get/1 without changing the interface.

I guess in that case the compiler would not be forced to resolve it the same way. (I assume the compiler simply tried to resolve the map literal just the same as before, but the ext functions made its work more complex…)

Ever since persistent term there isn’t really any need to bake constants into macro generated code literals anymore. It’s way faster to write into, and you get the same (and sometimes better) performance for reads. And it’s way easier to use.

1 Like

Something to note. Average core load of 4 does not mean only 4 cores are used

It could be that 16 are used but they spend most of their time on IO or idling or blocked by a lock somewhere. This is a not too rare thing to happen.

CPU load is a pretty flawed metric these days to evaluate how much you saturate the number of cores.

See CPU Utilization is Wrong and Linux Load Averages: Solving the Mystery as starting points

2 Likes