Extremely high memory usage in GenServers

Can you get recon from Hex, then try :recon.bin_leak(10) (http://ferd.github.io/recon/recon.html#bin_leak-1) when your system is using a lot of memory, and check 1) does usage go down a lot? and 2) what type of process shows up in the top list? (User vs Thought vs something else)

This will help pinpoint if GC really is the issue and where. Another possible problem would be if you receive large binary messages, then extract and keep in the state a small substring, which is really a pointer into the larger string and prevents it from being garbage collected.

You might find chapter 7 of Erlang in Anger useful (http://www.erlang-in-anger.com/).

1 Like

Yes, usage went down a ton after running that function actually (~1.2gb to 120mb on my machine). Can you expand a little bit on what this does and what it means (and how I might be able to fix my problem using these results)?

The 10 processes listed were not my User/Thought processes, they where Phoenix.Endpoint.CodeReloader, :ssl_manager, Logger, and a few other :gen_servers that didn’t have obvious names.

1 Like

Just for fun, I also called :recon.bin_leak(50), and the list was full of mostly values like this:

{#PID<0.22905.0>, -117,
  [current_function: {:gen_server, :loop, 6},
   initial_call: {:proc_lib, :init_p, 5}]},

I ran GenServer.call(pid(0, 23540, 0), :get), and can confirm that these are my User processes.

1 Like

Which function have you used to reduce memory?

1 Like

:recon.bin_leak/1 as suggested by @dom

1 Like

bin_leaks forces a garbage collect on all processes, and measures how many reference-counted binaries were freed per process. So this confirms lack of GC is the issue here.

Some things you can do:

  • If you have operations that generate lots of refc binary garbage, do them in a separate, short-lived process linked to your long-lived user process, so it doesn’t accumulate garbage.
  • You can use a timer to hibernate (see genserver doc) the user process after N seconds of inactivity, or when you know it won’t be getting messages for a while. The process will still be alive, but won’t hold extra memory.
  • You can also use a timer to force a gc every N seconds.
  • ETS as mentioned can help. Each process can own a table, it doesn’t have to be shared. This is a nice article about the difference it makes: http://theerlangelist.com/article/reducing_maximum_latency
2 Likes

Reduce the amount of memory you allocate to the BEAM. Why try to force some GC if you do not need it ?

2 Likes

If this is due to binaries being held due to being referenced, as opposed to just GC not running on those processes:

If you can identify which binaries are being kept around (sounds like messages from the user that are being parsed out, with the interesting components stored in those maps?), then consider using :binary.copy/1 on the binary snippets before storing them in your GenServer’s state. This will create a deep copy of those binaries, freeing the original binary they were pulled from, and then the GC can do its job on those original binaries.

1 Like

If really binarys from the bin-heap were the problem, observer wouldn’t count them to the genservers heapsize.

1 Like

This is not true. The process will garbage collect when it reaches the configurable min_heap_size, if the garbage collection was not enough it will allocate more memory to the heap.

2 Likes

It looks like you have one or more processes that are touching a “large” binary (i.e. a binary > 64 bytes), but are not allocating data frequently enough to be garbage collected themselves.

A large binary is reference counted, instead of being copied across processes. A reference count is bumped by every process that touches such binary. When a reference goes out of scope, the count is going to be decremented only after a fullsweep GC takes place. Until then, the ref count of a binary is > 0, and it’s kept in memory even if no one uses it. Therefore, if you have at least one process that touched a binary in the past, but is not allocating data too frequently to trigger a “fullsweep” GC, you’ll end up with an excessive amount of garbage binaries.

A simple example could be a process that acts as a mediator. It receives a message, then dispatches it to another process, and does nothing more than that. It doesn’t allocate a lot of data on its own, so it’s going to be GCed less frequently. If a part of dispatched messages is a large binary, the process touches a lot of large binaries, and can therefore be the cause of excessive dangling garbage.

You first need to identify such processes. Judging by your other output, it looks like they could be your User processes, but I can’t say for sure.

Once you know which processes are causing the problem, a simple fix could be to hibernate the process after every message. This is done by including :hibernate in the result tuple of handle_* callbacks (e.g. {:noreply, next_state, :hibernate}). This will reduce the throughput of the process, but can do wonders for your memory usage.

Another option is to set the fullsweep_after flag of the problematic process to zero or a very small value. I think that GenServer.start_link(callback_module, spawn_opt: [fullsweep_after: desired_value]) should do the job. For more explanation, look for fullsweep_after in docs for the :erlang module.

22 Likes

Wow, thanks for the really great explanation!

I can confirm that your hypothesis about the problem is correct. Adding :hibernate to the handle_calls that I figured were problematic made the memory usage of each User go down from ~4000kb to ~50kb.

I’m going to do some benchmarking to make sure throughput isn’t effected too much (I highly doubt it will be a problem in the Users), and then just use this as a temporary solution.

I think the next evolution of the app is to use ETS tables to store the data. Given that I want to scale up the number of these processes to the tens of thousands (and they are in groups, maybe a couple hundred groups could exist at a time?), what would be a good way to use ETS tables?

  • I’m assuming that having an ETS table per Thought and per User is not practical (there is a limit to the number of tables)
  • Maybe I could have a table for each group of thoughts and for each group of users, and then use a couple GenServers purely to route requests to the correct node? Would these processes become a bottleneck in any way?
  • The ETS tables could be deleted after a few minutes, as their data will be persisted in Postgres eventually

I know it’s hard to give a lot of input without more knowledge about the actual app, but any feedback on whether I’m thinking in the right direction would be helpful.

Thanks again for the awesome answer!

3 Likes

I’m thinking out loud right now: Could you use the Registry module to store the data?

I know it partitions data into multiple ETS tables, and cleans up on the process exit/crash.

You could have a process that acts serializes writes to the registry, after those few minutes that you mentioned the process can dump it’s Registry data into postgres and on exit the Registry will cleanup.

Thoughts?

1 Like

It’s hard for me to give any specific advice, other than not to go for ETS unless you know you need it :slight_smile:

Usual cases for ETS involve multiple processes reading/writing the same data. Another example could be a process with a large active heap which is frequently changing. There are probably other cases, but these are the ones I can think of immediately, where ETS can improve perf/mem usage dramatically.

If you don’t have problems without ETS, then I’d say just stick with that :slight_smile:

4 Likes

I haven’t really closely followed the discussion, so this might be a bit misplaced. But a common pattern for handling memory-expensive operations inside a GenServer is to spawn a separate process to do the processing - this means the process itself does not grow extensively in size, and the memory used for the computation can be freed immediately (when the “operation” process terminates) - you could even consider starting the process with a bigger initial heap to eliminate GC completely (though, that might be risky and excessive without thorough measurement).

For example, this could look like this:

def handle_call(_req, from, state) do
  task = Task.async(fn ->
    # some computation
  end)
  {:reply, Task.await(task), state}
end

Or in case the response could be delivered asynchronously, even like this:

def handle_call(_req, from, state) do
  Task.start_link(fn ->
    # some computation
    GenServer.reply(from, reply)
  end)
  {:noreply, state}
end
4 Likes

Ok, I think that’s good advice :slight_smile:

I’ll go with the :hibernate option for now and continue monitoring latencies and usage, and only go to ETS down the road if necessary.

2 Likes

Ah, good idea. I had been wondering if something along those lines is practical or not

1 Like

Does this actually help with large binary data though? So for example if you had

def handle_call({:run, some_binary}, _from, state) do
  task = Task.async(fn -> some_fun(some_binary) end)
  {:reply, Task.await(task), state}
end

The binary is touched by the genserver. All of the work is done in the task, so the genserver does very few allocations in its own process, which seems like it would be MORE likely to end up with this issue rather than less.

1 Like

The GenServer does little allocations which means it’s heap is kept small - GCs will be more frequent getting rid of the issue of holding on too long to the references to binaries. The binary leak is most prominent with processes that have huge heaps - this can happen if for a normally “quiet” process you have one, infrequent operation that is extremely memory expensive. This operation will cause the heap to balloon, and later will keep the GCs rare in regular operation, since there’s still a lot of free memory left - causing the process to hold on to the binary references for longer than it should.

Keeping the overall heap of the process small will prevent it from holding on too long to those references.

5 Likes

I’m not sure if binary reference is released on every GC though. Erlang docs state that fullsweep_after should be set to 0 If binaries that are no longer used are to be thrown away as soon as possible. So it seems that binaries are released only on fullsweep (which is by default after 65535 GCs), so even if the heap is small, the binary could be released much later. That’s just my theory though, didn’t really try it out :slight_smile:

3 Likes