I noticed that while compiling an application, ParallelCompiler will compile multiple modules in parallel, as long as the dependencies between them allow this, but that compiling mix dependencies does not. It compiles each package one-by-one. While some packages really take advantage of all cores, others do not, and I wonder if there are any constraints that prevent Elixir from monitoring parallelism within a package and if it’s making proper use of all cores.
I imagine that compiling all dependencies could be a multi-step process, first scanning and sorting by dependency on others, and then, similar to GenStage and Flow, dispatch modules to be compiled as events to System.schedulers_online, making sure that each consumer is compiling at least one module.
I’m looking to hear other people’s thoughts on this, as well as seeing how your dependency compilation times and cpu usage compare (time mix deps.compile --force), and if they are making optimal use of all your cores.
MacBook Pro 2019, 2.4GHz 8 core i9: mix deps.compile --force 231.52s user 25.97s system 317% cpu 1:21.12 total
You cannot monitor NIFs or other “external” build tools. So this could happen that you would run in parallel 2 packages that would try to compile some NIF with make -j8 on machine with 8 cores and it would in the end take longer due to constant scope swapping between the two compilation processes.
Maybe not monitor, or run external build tools in parallel, but that shouldn’t (have to) mean that Elixir/erlang-only packages couldn’t run in parallel?
I think it may make scheduling more complicated, and I had not thought about this aspect, but I think even if only parts are parallelized that could already be improving performance. Acquiring a ‘lock’ or exclusivity on system resources for external build tools, possibly configurable by the package author, might prevent it from having to switch constantly.
The problem is that you have no way to know whether it can or cannot be run in parallel. In theory there could be added an additional option in the compilers to allow them to compile applications independently, but I think it would be a little too much work for little gain, especially as files within each project are compiled independently whenever that is possible.
If I understand correctly, using an compiler other than Elixir requires additional configuration for use with mix, which could be used to filter dependencies that can be compiled solely with the Elixir compiler, and by using LexicalTracker on those projects, both internal and cross-project dependencies could be tracked and resolved when those become available?
Even if a compiler would have to require a global lock (e.g. be the only compiler running), I think it would still take better use of available compute resources if it was able to parallelize partially.
Thing is—I’m not sure how long average projects take to compile their dependencies. I can imagine speed improvements in compiling dependencies being irrelevant to many as their cache often is enough to prevent subsequent recompilation. The reason for asking is only because I noticed that while updating Elixir and Erlang, compilation used all cores, but recompiling the project barely used half of available resources and I wondered why
Regardless if the gains will be big or small, I’d love to see more effort in further parallelising building your Elixir project, dependencies included. I am also aware of the problems @hauleth mentioned but it shouldn’t be hard to just always compile packages that contain NIFs serially? IMO it’s pretty trivial to detect.
From then on, all pure Erlang/Elixir dependencies and the Elixir project itself should be a fair game for maximally parallelised compilation.
I’d like to see all my 10/20 CPU cores maxed while building!
Oh really? How? There can be a compiler defined for each of different languages. I could even write assembler in Elixir (because why f…reaking not). How would you detect whether I am not doing something weird? Or that I am generating some magical files somewhere (even from Elixir itself) that are used somewhere else. I am not even sure how it would work with setting application environment variables during compilation. Due to expressiveness of Elixir the parallel compilation of different projects at once can go south enormously fast.
If I remember well, there are certain pieces of code that you have to write (in C) if you like to actually interface with a NIF. That is not hard to detect.
You are right there are code-generating cases though, I haven’t considered them. I guess for those cases I’d be fine if 4-5 NIF-relying packages compile a bit slower because they compete too much.
My point is, I get this little annoying feeling when I see projects with 400+ files compiling for 12 secs on a beefy workstation and the CPU cores never pass 20-30% load each. You know what I mean. I wonder if there are low-hanging fruits to pick there.
True true. I didn’t account for all possibilities. A manual switch in the package itself (has_nif: true for example?) could also help. Can’t imagine it being that hard to introduce and default to false if not present. As mentioned above, might make several NIF packages fight for your CPU cores but eh, is that a doomsday scenario?
It could even be an opt-in feature, allowing you to execute mix deps.compile --parallel, and as we find cases that cause problems, we can tackle them. After some time, it could become the default with a flag to disable.
Looks very promising!
Or compile_parallel: false to disable, or as I mentioned, allow package authors to specify if it supports being compiled in parallel with others. Additionally, like extra_applications, you could specify a allow/block-list to override parallel compiler compatibility.