Using Alpine and musl instead of GNU libc affect performance?

Newxan · August 16, 2023, 8:18am

Background

Like many others we’re currently using Alpine images for our deployments. Specifically at the moment we rely on:
hexpm/elixir:1.15.4-erlang-26.0.2-alpine-3.17.4.

When looking around for a current unrelated issue i stumbled upon this issue Overrun stack and heap OTP-26.0 · Issue #7292 · erlang/otp · GitHub

which had this comment

This made me curious, and I didn’t want to pollute the issue with unrelated comments so I thought I would ask my question here instead.
Looking around in the forum and other places I can’t find any information stating that performance would be different depending on if your system is using GNU libc or musl and since Alpine images are very popular and by default rely on musl I felt it warranted a post.

Question

Is there any performance difference if I use a images such as Alpine that rely on musl rather than for example a Ubuntu image with libc for production deployments? and is any such difference jit related?

Thank you in advance.

jhogberg · August 16, 2023, 8:24am

Yes, there’s a significant loss in performance with musl. I made a PR for OTP 27 that will remove these differences, the comment describes things in a bit more detail:

github.com/erlang/otp

jit: Refactor sigaltstack(2) handling

erlang:master ← jhogberg:john/jit/refactor-unix-sigaltstack/OTP-18568

opened 10:26AM - 26 Apr 23 UTC

jhogberg

+63 -266

Erlang code compiled to x86 native code uses `RSP` as its stack pointer. This im…proves performance in several ways: - It permits the use of the x86 `CALL` and `RET` instructions, which reduces code volume and improves branch prediction. - It avoids stealing a callee-save register to act as a stack pointer. Unix signal handlers are by default delivered onto the current stack, i.e. `RSP`. This is a problem since our native-code stacks are small and may not have room for the Unix signal handler. There is a way to redirect signal handlers to an "alternate" signal stack by using the `SA_ONSTACK` flag with the `sigaction(2)` system call. Unfortunately this has to be specified explicitly for each signal, and it is impossible to enforce given the presence of libraries. We used to attempt to override the C library's signal handler setup procedure with our own that added the `SA_ONSTACK` flag, but it only worked with `GNU libc` which is not always the current libc. As many of our users liked to run docker images with `Alpine` which uses `musl` instead, they got needlessly bad performance without knowing it. Instead, we now explicitly add `SA_ONSTACK` to our own uses of `sigaction(2)` and ignore the library problem altogether because: 1. We don't care about this problem on non-scheduler threads: if a library wants to fiddle around with signals on its own threads then it doesn't affect us. 2. We don't care about this problem when executing on the runtime stack: if a NIF or driver uses signals in a creative manner locally during a call, then that's fine as long as they restore them before returning to Erlang code. A NIF or driver that doesn't do this is misbehaving to begin with and we can't shield ourselves against that. 3. If a library that we're statically linked to messes around with signals in the initialization phase (think C++ constructors of static objects), all of it will happen before `main` runs and we'll set things straight in `sys_init_signal_stack`. If a dynamically linked library does the same, the same restrictions as ordinary NIF/driver calls apply to the initialization phase and the library must restore the signals before returning. If any threads are created in either of these phases, they're still not scheduler threads so we don't have to care then either.

Newxan · August 16, 2023, 8:27am

Thanks for the insight!

Moosieus · April 13, 2024, 12:48am

Do you have a ballpark estimate of what the performance difference would be prior to OTP 27?

tj0 · April 13, 2024, 1:55am

Super interesting. I thought the performance difference would be due to the default allocator in musl being slower. I know Android uses different defaults to improve performance.

Also interested if there is any performance difference with OTP 27.

Super excited for the new json module being included by default.

jhogberg · April 16, 2024, 8:33am

~10-15% in sequential code.

altdsoy · May 1, 2024, 6:09am

Wow!

I wanted to check for the json module doc… But it was broken on the RC2 news page…
but I suspected that it might be a newer news update, indeed here is the RC3 release page: Erlang/OTP 27.0 Release Candidate 3 - Erlang/OTP

But hey!!
Look at the doc for the json module: json — stdlib v6.0

Looks familiar right?!

I’m not an erlang developer but from time to time I was looking at the Erlang Docs and I can’t express enough how excited I am for this doc update! Kudos to the Erlang/Elixir folks!