Compiling Erlang got a lot slower - what to do? Parallelize?

We got a really sizable lot of generated ASN.1 code (~780K lines) that we have not much influence over (because it’s generated by asn1ct, Erlang’s ASN.1 compiler). It’s actually spread over 8 files.

When it comes to compiling the .erl files, I see that only one beam.smp is running at this point and it’s filling up its core:

top - 21:37:30 up 7 days,  6:22,  7 users,  load average: 0.99, 0.50, 0.30
Tasks: 772 total,   1 running, 769 sleeping,   1 stopped,   1 zombie
%Cpu(s):  2.0 us,  0.3 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26402664+total, 70938496 free, 12586772 used, 18050137+buff/cache
KiB Swap: 20971516 total, 20956016 free,    15500 used. 24979902+avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29431 me        20   0 9245420 572804   6004 S 100.7  0.2   1:53.98 beam.smp
30263 me        20   0  160776   5184   3768 R   0.7  0.0   0:00.18 top
29611 me        20   0  506804 214708   4720 S   0.3  0.1   2:07.98 Parser-1:3
29616 me        20   0  503184 210664   4720 S   0.3  0.1   2:07.90 Parser-1:8
29626 me        20   0  506020 213592   4720 S   0.3  0.1   2:07.95 Parser-1:18
29632 me        20   0  507672 215620   4720 S   0.3  0.1   2:08.36 Parser-1:24
29633 me        20   0  505480 212600   4720 S   0.3  0.1   2:08.07 Parser-1:25
29642 me        20   0  506368 213724   4720 S   0.3  0.1   2:07.95 Parser-1:34
29643 me        20   0  504368 211868   4720 S   0.3  0.1   2:07.82 Parser-1:35
29646 me        20   0  507144 214836   4716 S   0.3  0.1   2:08.13 Parser-1:38
29652 me        20   0  504676 212232   4720 S   0.3  0.1   2:08.32 Parser-1:44
29656 me        20   0  501392 209188   4716 S   0.3  0.1   2:08.18 Parser-1:48
20442 me        20   0  113184   2748   2596 S   0.0  0.0   0:00.00 bash
20443 me        20   0  114668   4000   2252 S   0.0  0.0   0:02.24 bash
27041 me        20   0  155432   4380   3072 S   0.0  0.0   0:00.00 sshd
27169 me        20   0  125324  15280   3136 S   0.0  0.0   0:01.25 bash
29450 me        20   0    4384    680    576 S   0.0  0.0   0:00.00 erl_child_setup
29583 me        20   0  771860 554348   5540 S   0.0  0.2   0:23.80 Cooker
29609 me        20   0  507188 214700   4712 S   0.0  0.1   2:08.26 Parser-1:2
29612 me        20   0  503584 211508   4708 S   0.0  0.1   2:08.29 Parser-1:4
29613 me        20   0  505752 213632   4720 S   0.0  0.1   2:07.88 Parser-1:5
29614 me        20   0  500676 208760   4720 S   0.0  0.1   2:08.63 Parser-1:6
29615 me        20   0  506896 214844   4712 S   0.0  0.1   2:08.57 Parser-1:7
29617 me        20   0  507036 214692   4720 S   0.0  0.1   2:07.95 Parser-1:9
29618 me        20   0  504004 211380   4720 S   0.0  0.1   2:08.76 Parser-1:10
29619 me        20   0  503532 211412   4720 S   0.0  0.1   2:08.11 Parser-1:11
29620 me        20   0  504968 212848   4720 S   0.0  0.1   2:07.72 Parser-1:12

Also, there are lots of these Parser processes, but they don’t require much resources.

While this machine in particular is quite powerful, the servers this is done on most of the time actually have less cores and there it takes more than 12 minutes to compile the 8 files. Also, it takes more than 9 GB of memory allocated to this single BEAM instance as you can see here.

Is there a way to parallelize this better or is this actually not my problem?

It’s just crazy but total runtime of the build job has gone from 5 minute average (same amount of Erlang code) to peaks like 27 minutes - with no code base changes that could really account for it.

The main change is the tooling:

  • elixir 1.6.4 to 1.10.4 and
  • OTP 20.2 to OTP 22.3.

IIRC people were asking on the Erlang mailing list about changes in compiler performance but this is a lot worse… Any ideas?

Since you are do such large version bumps and upgrading multiple components at the same time most of your answers will probably be guesses unless someone happens to know exactly what this issue is.

Try to narrow down where the issue comes from. It would help if you could isolate if it’s a change to Elixir or Erlang. Try to only change either the Elixir or the Erlang version and try to do smaller version bumps.

3 Likes

Thank you. I will try to bump OTP forward to 22.3.4.1 first to see of the “compiler hang” issue is the root cause. Sadly the problem primarily shows on CI servers where only preapproved packages can be used - and that makes testing varying combinations… time-consuming. So fingers crossed for the “compiler hang”!