It looks like you have one or more processes that are touching a “large” binary (i.e. a binary > 64 bytes), but are not allocating data frequently enough to be garbage collected themselves.
A large binary is reference counted, instead of being copied across processes. A reference count is bumped by every process that touches such binary. When a reference goes out of scope, the count is going to be decremented only after a fullsweep GC takes place. Until then, the ref count of a binary is > 0, and it’s kept in memory even if no one uses it. Therefore, if you have at least one process that touched a binary in the past, but is not allocating data too frequently to trigger a “fullsweep” GC, you’ll end up with an excessive amount of garbage binaries.
A simple example could be a process that acts as a mediator. It receives a message, then dispatches it to another process, and does nothing more than that. It doesn’t allocate a lot of data on its own, so it’s going to be GCed less frequently. If a part of dispatched messages is a large binary, the process touches a lot of large binaries, and can therefore be the cause of excessive dangling garbage.
You first need to identify such processes. Judging by your other output, it looks like they could be your User
processes, but I can’t say for sure.
Once you know which processes are causing the problem, a simple fix could be to hibernate the process after every message. This is done by including :hibernate
in the result tuple of handle_*
callbacks (e.g. {:noreply, next_state, :hibernate}
). This will reduce the throughput of the process, but can do wonders for your memory usage.
Another option is to set the fullsweep_after
flag of the problematic process to zero or a very small value. I think that GenServer.start_link(callback_module, spawn_opt: [fullsweep_after: desired_value])
should do the job. For more explanation, look for fullsweep_after
in docs for the :erlang module.