On my development VM (specs see below) - starting an elixir / erlang container takes up 2G of physical memory while running iex locally only takes up a few MB.
I only started an empty container with no modules loaded or code executed.
When more containers are started beyond the memory capabilities of the host, running Beam VM’s running in other (running) containers crash for the new instance to start.
(Both instances running the same docker run -it elixir)
Other users could reproduce the problematic memory usage with the Ubuntu Desktop (kernel v5.15) but Ubuntu Server (kernel v6.4) is supposedly running just fine.
(Both running docker 24.0.2)
So far I tried:
running different versions of the image elixir:slim, elixir:alpine, …
limiting memory usage on docker → container crashes with errno 137 (out of memory);
checking for swap → swap is disabled on both devices;
running different versions of docker;
Any ideas on this problem would be apreciated since I’m unable to narrow down this problem to a Beam / Docker / OS / kernel level.
docker run --name elixirhexpm --rm -it hexpm/elixir:1.15.4-erlang-26.0.2-debian-buster-20230612-slim
Memory usage is 5MB; then I ran iex in the container. Memory usage is 2GB.
hexpm/elixir:1.15.4-erlang-25.3.2.4-debian-buster-20230612-slim (so OTP 25) has the exact same issue, but funny enough the memory usage is 1.5GB in stead of 2.0GB.
As said: I cannot reproduce it everywhere. My desktop has this issue, but my linux vm/vps server cannot reproduce it. So there is definitely something else (docker, containerd, linux kernel, ??) that affects this issue. I have only tested on amd64.
But @luechtdev obviously has the same issue on some of his linux systems.
I’m not a linux master to understand how to correctly get the memory usage, however from my 8GB instance, it is using 1.6% with IEX opened, witch should be roughly 128 MB.
I created a new (arm) vm with fedora 38 and can reproduce it there as well (seems that fedora 38 seems to be one of the distros with new enough kernel/containerd/etc to reproduce).
docker run --name elixirhexpm --rm -it hexpm/elixir:1.15.4-erlang-26.0.2-debian-buster-20230612-slim iex
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
b9f0d3e19106 elixirhexpm 0.00% 2.04GiB / 3.712GiB 54.95% 1.36kB / 0B 0B / 0B 21
[root@fedora-4gb-fsn1-2 ~]# uname -a
Linux fedora-4gb-fsn1-2 6.2.15-300.fc38.aarch64 #1 SMP PREEMPT_DYNAMIC Thu May 11 16:54:06 UTC 2023 aarch64 GNU/Linux
[root@fedora-4gb-fsn1-2 ~]# docker version
Client: Docker Engine - Community
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:37:12 2023
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.5
API version: 1.43 (minimum version 1.12)
Go version: go1.20.6
Git commit: a61e2b4
Built: Fri Jul 21 20:35:57 2023
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Yup, I can reproduce on Manjaro (kernel 6.3.*) with both Erlang 26 and 25:
# Erlang 26
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
00fcdb049774 elixirhexpm 0.14% 2.043GiB / 31.18GiB 6.55% 8.23kB / 0B 0B / 0B 26
# Erlang 25
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
84462b9c494b elixirhexpm 0.15% 1.538GiB / 31.18GiB 4.93% 3.73kB / 0B 0B / 0B 26
However, on my Mac it isn’t happening:
# Erlang 26
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5ece3af3e231 elixirhexpm 0.00% 62.81MiB / 23.48GiB 0.26% 1.11kB / 0B 283kB / 0B 58
# Erlang 25
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
b4d64a258d1d elixirhexpm 0.08% 57.21MiB / 23.48GiB 0.24% 806B / 0B 0B / 0B 58
My Docker-fu is pretty rusty, I would guess this might have something to do with how is memory managed or pre-reserved or something similar. But not certain.
Erlang on most Unixes allocates a lot of virtual memory at startup, but only uses a small amount of it. If your OS physically reserves all virtual memory (or limits memory usage by looking at virtual memory) you can change how much is reserved by Erlang though the +MIscs switch.
The problem seems to be the default Port limits visible under runtime_info/0.
In some cases (with excess memory usage) the ERL_MAX_PORTS value is set to 134,217,727 instead of the default 1,024. (approx. 50% of the max allowed value)
docker run -it -e ERL_MAX_PORTS=1024 elixir fixes the problem – in my case – but it still is just a workaround and doesn’t really explain this behavior.
The default ERL_MAX_PORTS is taken from the sysconf value of OPEN_MAX. So if you have that set very high on your system, the port table will use a lot of memory.
You can get your value (on Linux) like this:
$ getconf OPEN_MAX
1024
This behaviour is described in the docs for erl +Q.
Thank you very much for this clarification and the link to the Erlang documentation!
It didn’t immediately click for me that ERL_MAX_PORTS is the same as the file discriptors ulimit -n.
With this information we know how to fix the issue, but it did not explain why it only happens on some systems and only when running elixir/erlang in a container. Then I stumbled upon this pull request for containerd: https://github.com/containerd/containerd/pull/7566
From this discussion it’s clear that at some point/version the file discriptor limits for containerd (and something also changed with systemD) changed from a real limit (1_048_576) to unlimited. With unlimited the mentioned +Q will take it’s maximum allowed value (134_217_727).
From that discussion it’s clear that having unlimited number of file descriptors is affecting other software as well.
In my opinion the Erlang numbers are reasonable enough (even with the maximum number it will still work fine, just use 1GB of memory), but I hope containerd will revert back to a “normal” limit. As shown here, for most people it’s not immediately intuitive what the cause of this excessive memory usage is and how to resolve. If your software needs (very) high limits of file descriptors, you’ll know and have the knowledge to manually override the settings/flags for it.
Thank you all, I’ve been losing my mind trying to figure out why my app suddenly needed 2GB of RAM when it wasn’t even close before. Turns out my Docker version got upgraded and I was suddenly hitting this issue. Setting ERL_MAX_PORTS fixed it!