Extremely high memory usage in GenServers

If really binarys from the bin-heap were the problem, observer wouldn’t count them to the genservers heapsize.

1 Like

This is not true. The process will garbage collect when it reaches the configurable min_heap_size, if the garbage collection was not enough it will allocate more memory to the heap.

3 Likes

It looks like you have one or more processes that are touching a “large” binary (i.e. a binary > 64 bytes), but are not allocating data frequently enough to be garbage collected themselves.

A large binary is reference counted, instead of being copied across processes. A reference count is bumped by every process that touches such binary. When a reference goes out of scope, the count is going to be decremented only after a fullsweep GC takes place. Until then, the ref count of a binary is > 0, and it’s kept in memory even if no one uses it. Therefore, if you have at least one process that touched a binary in the past, but is not allocating data too frequently to trigger a “fullsweep” GC, you’ll end up with an excessive amount of garbage binaries.

A simple example could be a process that acts as a mediator. It receives a message, then dispatches it to another process, and does nothing more than that. It doesn’t allocate a lot of data on its own, so it’s going to be GCed less frequently. If a part of dispatched messages is a large binary, the process touches a lot of large binaries, and can therefore be the cause of excessive dangling garbage.

You first need to identify such processes. Judging by your other output, it looks like they could be your User processes, but I can’t say for sure.

Once you know which processes are causing the problem, a simple fix could be to hibernate the process after every message. This is done by including :hibernate in the result tuple of handle_* callbacks (e.g. {:noreply, next_state, :hibernate}). This will reduce the throughput of the process, but can do wonders for your memory usage.

Another option is to set the fullsweep_after flag of the problematic process to zero or a very small value. I think that GenServer.start_link(callback_module, spawn_opt: [fullsweep_after: desired_value]) should do the job. For more explanation, look for fullsweep_after in docs for the :erlang module.

43 Likes

Wow, thanks for the really great explanation!

I can confirm that your hypothesis about the problem is correct. Adding :hibernate to the handle_calls that I figured were problematic made the memory usage of each User go down from ~4000kb to ~50kb.

I’m going to do some benchmarking to make sure throughput isn’t effected too much (I highly doubt it will be a problem in the Users), and then just use this as a temporary solution.

I think the next evolution of the app is to use ETS tables to store the data. Given that I want to scale up the number of these processes to the tens of thousands (and they are in groups, maybe a couple hundred groups could exist at a time?), what would be a good way to use ETS tables?

  • I’m assuming that having an ETS table per Thought and per User is not practical (there is a limit to the number of tables)
  • Maybe I could have a table for each group of thoughts and for each group of users, and then use a couple GenServers purely to route requests to the correct node? Would these processes become a bottleneck in any way?
  • The ETS tables could be deleted after a few minutes, as their data will be persisted in Postgres eventually

I know it’s hard to give a lot of input without more knowledge about the actual app, but any feedback on whether I’m thinking in the right direction would be helpful.

Thanks again for the awesome answer!

3 Likes

I’m thinking out loud right now: Could you use the Registry module to store the data?

I know it partitions data into multiple ETS tables, and cleans up on the process exit/crash.

You could have a process that acts serializes writes to the registry, after those few minutes that you mentioned the process can dump it’s Registry data into postgres and on exit the Registry will cleanup.

Thoughts?

1 Like

It’s hard for me to give any specific advice, other than not to go for ETS unless you know you need it :slight_smile:

Usual cases for ETS involve multiple processes reading/writing the same data. Another example could be a process with a large active heap which is frequently changing. There are probably other cases, but these are the ones I can think of immediately, where ETS can improve perf/mem usage dramatically.

If you don’t have problems without ETS, then I’d say just stick with that :slight_smile:

5 Likes

I haven’t really closely followed the discussion, so this might be a bit misplaced. But a common pattern for handling memory-expensive operations inside a GenServer is to spawn a separate process to do the processing - this means the process itself does not grow extensively in size, and the memory used for the computation can be freed immediately (when the “operation” process terminates) - you could even consider starting the process with a bigger initial heap to eliminate GC completely (though, that might be risky and excessive without thorough measurement).

For example, this could look like this:

def handle_call(_req, from, state) do
  task = Task.async(fn ->
    # some computation
  end)
  {:reply, Task.await(task), state}
end

Or in case the response could be delivered asynchronously, even like this:

def handle_call(_req, from, state) do
  Task.start_link(fn ->
    # some computation
    GenServer.reply(from, reply)
  end)
  {:noreply, state}
end
9 Likes

Ok, I think that’s good advice :slight_smile:

I’ll go with the :hibernate option for now and continue monitoring latencies and usage, and only go to ETS down the road if necessary.

2 Likes

Ah, good idea. I had been wondering if something along those lines is practical or not

1 Like

Does this actually help with large binary data though? So for example if you had

def handle_call({:run, some_binary}, _from, state) do
  task = Task.async(fn -> some_fun(some_binary) end)
  {:reply, Task.await(task), state}
end

The binary is touched by the genserver. All of the work is done in the task, so the genserver does very few allocations in its own process, which seems like it would be MORE likely to end up with this issue rather than less.

1 Like

The GenServer does little allocations which means it’s heap is kept small - GCs will be more frequent getting rid of the issue of holding on too long to the references to binaries. The binary leak is most prominent with processes that have huge heaps - this can happen if for a normally “quiet” process you have one, infrequent operation that is extremely memory expensive. This operation will cause the heap to balloon, and later will keep the GCs rare in regular operation, since there’s still a lot of free memory left - causing the process to hold on to the binary references for longer than it should.

Keeping the overall heap of the process small will prevent it from holding on too long to those references.

5 Likes

I’m not sure if binary reference is released on every GC though. Erlang docs state that fullsweep_after should be set to 0 If binaries that are no longer used are to be thrown away as soon as possible. So it seems that binaries are released only on fullsweep (which is by default after 65535 GCs), so even if the heap is small, the binary could be released much later. That’s just my theory though, didn’t really try it out :slight_smile:

3 Likes

Bit late but another useful new feature to start your gen_server with the hibernate_after option, such as:

{:ok, worker} = GenServer.start_link(module, args, hibernate_after: 5_000) 

This will ensure that once your worker is bored for more than 5 seconds it will garbage collect everything it can.

19 Likes

Oh woah. Extremely useful!

1 Like