Elixir 1.19 compilation performance - have you noticed a difference?

Compilation time for our project degraded from ~26 seconds to ~32 seconds (~23% slower) :disappointed_face:.

Did anyone else compare compilation performance on their project?

$ time mix compile --force && elixir -v
Compiling 1397 files (.ex)
Generating Enaia.Cldr for 2 locales named [:en, :und] with a default locale named :en
Generated enaia app

real	0m25.922s
user	1m59.042s
sys	0m16.722s
Erlang/OTP 27 [erts-15.2.7.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.18.4 (compiled with Erlang/OTP 27)

time mix compile --force && elixir -v
Compiling 1397 files (.ex)
Generating Enaia.Cldr for 2 locales named [:en, :und] with a default locale named :en
Generated enaia app

real	0m32.299s
user	2m48.014s
sys	0m20.359s
Erlang/OTP 28 [erts-16.1.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.19.1 (compiled with Erlang/OTP 28)
3 Likes

The compiler is definitely doing more work overall in 1.19.X, so make sure you’re leveraging the MIX_OS_DEPS_COMPILE_PARTITION_COUNT env var, described here: elixir/CHANGELOG.md at v1.19 Ā· elixir-lang/elixir Ā· GitHub

If the compilation is still longer overall after setting that value to around half your cores, I’d open a bug, as I don’t think that’s expected.

3 Likes

Also, you’re the freedom_formatter guy! Thanks for the trailing commas :heart:!

7 Likes

Can you try with the —profile time option to mix compile? I am suspecting a type system related slowdown (you can also try v1.19 branch, we just pushed more optimizations).

1 Like

Thanks. Unfortunately specifying MIX_OS_DEPS_COMPILE_PARTITION_COUNT=2 didn’t noticeably change the results.

1 Like

Thanks @josevalim . I’m not sure what to look for when using –profile time… There’s a lot of output :sweat_smile:. What can I do to help figure out the source of the slowdown? Do I need the output from 1.18 and what would I compare?

1 Like

Wait a second. Compilation took 21s but the Mix task took 33s?! Try running ā€œmix compile —forceā€ with MIX_DEBUG=1 and see which task is taking long.

1 Like

Right. There’s a sizeable delay that doesn’t have any output, even with MIX_DEBUG=1:

MIX_DEBUG=1 time mix compile --force --profile time
-> Running mix CLI
-> Running mix loadconfig (inside Enaia.MixProject)
<- Ran mix loadconfig in 4ms
-> Running mix compile --force --profile time (inside Enaia.MixProject)
-> Running mix loadpaths --force --profile time (inside Enaia.MixProject)
-> Running mix deps.loadpaths --force --profile time (inside Enaia.MixProject)
-> Running mix archive.check --force --profile time (inside Enaia.MixProject)
<- Ran mix archive.check in 0ms
<- Ran mix deps.loadpaths in 366ms
<- Ran mix loadpaths in 367ms
-> Running mix compile.all --force --profile time (inside Enaia.MixProject)
-> Running mix compile.phoenix_live_view --force --profile time (inside Enaia.MixProject)
<- Ran mix compile.phoenix_live_view in 0ms
-> Running mix compile.erlang --force --profile time (inside Enaia.MixProject)
<- Ran mix compile.erlang in 0ms
-> Running mix compile.elixir --force --profile time (inside Enaia.MixProject)
Compiling 1397 files (.ex)
[profile] ...
[profile] Finished cycle resolution in 1ms
[profile] Finished compilation cycle of 1648 modules in 19888ms
[profile] Finished group pass check of 1648 modules in 1395ms
<- Ran mix compile.elixir in 30639ms
-> Running mix compile.app --force --profile time (inside Enaia.MixProject)
Generated enaia app
<- Ran mix compile.app in 26ms
-> Running mix compile.surface --force --profile time (inside Enaia.MixProject)
<- Ran mix compile.surface in 1604ms
-> Running mix compile.yecc --force --profile time (inside Enaia.MixProject)
<- Ran mix compile.yecc in 0ms
-> Running mix compile.leex --force --profile time (inside Enaia.MixProject)
<- Ran mix compile.leex in 0ms
<- Ran mix compile.all in 32300ms
<- Ran mix compile in 32792ms
<- Ran mix CLI in 32848ms
       33.08 real       171.14 user        20.15 sys
1 Like

Ok, good news is that it is not compiler related, bad news is that we don’t have instrumentation around the areas that are slow now.

When does the delay happen? Before it starts compiling each module? In the middle? At the end? My suggestion is to add some print statements around mix/compilers/elixir.ex and try to get a feeling where in the code the slowdown is.

Paying more attention to it, the mysterious delay is after the ā€œcompilation cycleā€ and before the ā€œgroup pass checkā€:

[profile] Finished compilation cycle of 1648 modules in 19888ms
# At least 10 seconds passes
[profile] Finished group pass check of 1648 modules in 1395ms
<- Ran mix compile.elixir in 30639ms

Is this when the type checking is happening?

No. I have a hunch this is protocol consolidation. If you open up mix/compilers/elixir.ex, you will see each_cycle and after_compile callbacks, measure those!

2 Likes

I recommend GitHub - zmwangx/ets: Command output timestamper as a tool to show you how much each step of a command took (with the caveat that each step must output a new line in the terminal).

The incremental mode does not always work so I just use the elapsed mode i.e.

ets -s -c -f '[%T.%f]' ping google.com

I’ve used this many times to help me identify slow spots.

2 Likes

I pushed a commit that measures what is happening within those 10 seconds, so you can try the v1.19 branch. In a nutshell, we do two things:

  1. we write files to disk
  2. we consolidate protocols

Then let me know the times reported.

I don’t think it is the ā€œwriting to diskā€ bit, cause that was happening before anyway, but I was able to optimize it anyway (a project with 1000 modules went from 1s to 333ms on my machine). I don’t think the gain will be noticeable in practice though, unless you are using containers or network drives.

On the other hand, we did change considerably how protocol consolidation works on v1.19, so perhaps a regression there?


I created a project with 10 protocols with 200 implementations each, and the consolidation pass is 3 seconds:

protos =
  for i <- 1..10 do
    defprotocol Module.concat(Proto, "M#{i}") do
      def foo_bar(x)
    end

    Module.concat(Proto, "M#{i}")
  end

for j <- 1..200 do
  defmodule Module.concat(Foo, "M#{j}") do
    defstruct [:foo]

    for proto <- protos do
      defimpl proto do
        def foo_bar(x), do: x
      end
    end
  end
end

So in case both the writing and consolidation are fast in your case, you may need to add some debugging statements to find the root cause.


PS: I also saw that you changed Erlang/OTP versions. Try running v1.19 with the same OTP version to rule out an issue with OTP!

2 Likes

Thanks @josevalim!

Findings:

  • 1.19 slowdown confirmed to be in ā€œafter compile callbackā€.
  • 1.20 introduces additional slowdown in ā€œgroup pass checkā€ (1.4 → 6.4 seconds)
  • OTP version not relevant.

Details:
I re-ran it in 1.18 just to confirm that the protocol consolidation / file writing was faster in 1.18:

[profile] Finished compilation cycle of 1648 modules in 20759ms
# About 3 seconds in 1.18, which matches the overall time
[profile] Finished group pass check of 1648 modules in 1069ms
<- Ran mix compile.elixir in 23795ms

I can also confirm that OTP has no impact (i.e. 1.19 with older OTP version is also slow)

Compiling with the latest changes gives the following output:

[profile] Finished cycle resolution in 0ms
[profile] Finished compilation cycle of 1648 modules in 21185ms
[profile] Finished writing modules to disk in 284ms
[profile] Finished after compile callback in 8216ms
[profile] Finished group pass check of 1648 modules in 6402ms
<- Ran mix compile.elixir in 36360ms

So ā€œafter compile callbackā€ is indeed slow, but strangely ā€œgroup pass checkā€ is also slower in 1.20.

So I started from 1.19.1’s source and just added the ā€œafter compile callbackā€ output and got back the quicker ā€œgroup pass checkā€:

[profile] Finished cycle resolution in 0ms
[profile] Finished compilation cycle of 1648 modules in 21974ms
[profile] Finished after compile callback in 9124ms
[profile] Finished group pass check of 1648 modules in 1398ms
<- Ran mix compile.elixir in 34360ms
7 Likes

So ā€œafter compile callbackā€ is indeed slow, but strangely ā€œgroup pass checkā€ is also slower in 1.20.

It is slower because we are already doing even more inference on v1.20. We will probably need to improve the performance though before we ship it, but we can postpone this conversation a bit.


Back to the issue at hand, I recommend measuring some times here:

And I assume maybe_consolidate is the slow one:

So you may need to break Mix.Compilers.Protocol down. Unfortunately I could not reproduce it by brute forcing (compiling hundreds of protocols), so there may be something project specific.

1 Like

To my surprise, Protocol.extract_protocols takes about 4.4 seconds while consolidate takes 3.7 seconds, so both of these are significantly slower.

Consolidation per protocol (de-parallelized for easy timing) doesn’t reveal any particular outlier:

14 IEx.Info
15 Hex.Solver.Constraint
16 List.Chars
30 Collectable
34 Enumerable
51 String.Chars
68 JSON.Encoder
71 Enaia.CRE.Deal.Timeline.Event
77 Browser.Ua
77 Cldr.DisplayName
79 Enaia.Comments.Commentable
81 Cldr.Chars
81 Scrivener.Paginater
83 Phoenix.HTML.FormData
85 Ecto.Queryable
85 FunWithFlags.Group
88 Bamboo.Formatter
88 Table.Reader
89 LogfmtEx.ValueEncoder
93 Enaia.ActivityTracker.Trackable
94 Appsignal.Metadata
94 EnaiaWeb.Live.Helpers.ColumnItemGenerator
94 Phoenix.HTML.Safe
97 Enaia.CRE.Account.Address
97 Enaia.SpaceRenderable
98 Enaia.Documents.PDFConvertible
98 Timex.Protocol
100 DBConnection.Query
104 Poison.Decoder
109 Number.Conversion
110 Phoenix.Param
113 FunWithFlags.Actor
116 Bandit.WebSocket.Socket
116 Plug.Exception
116 Timex.Comparable
118 Cldr.LanguageTag.Chars
130 Bandit.HTTPTransport
131 ExTwilio.JWT.Grant
133 Oban.Met.Value
135 Ecto.DevLogger.PrintableParameter
157 Solid.Block
167 Inspect
204 Solid.Matcher
204 Solid.Renderable
207 ExAws.Operation
215 Bandit.WebSocket.Frame.Serializable
236 Enaia.Errors.Reportable
262 Poison.Encoder
279 Bandit.HTTP2.Frame.Serializable
597 Jason.Encoder

I don’t have time to dig much further today

This is very weird indeed, because extract_protocols did not change at all! I can only think that somehow :beam_lib.chunks is way more expensive now? Do you see a large difference in .beam file sizes?

PS: I pushed a new commit to the v1.19 branch if you want to validate this theory.

3 Likes

Thanks and thanks for confirming extract_protocols has not changed.
I wanted to double check what’s going on in 1.18.4 but I get a compilation error in a dependency when I compile from source, very strange.
I probably won’t have time to check further until next Tuesday.

1 Like