Extremely high memory usage in GenServers

pdilyard · March 7, 2017, 10:15pm

I’m deep into debugging a very high memory usage problem in a group of GenServers.

There are two types of GenServer implementations I’m examining:

# module:
MessageEngine.Thought

# example state:
%DB.Thought{__meta__: #Ecto.Schema.Metadata<:loaded, "thoughts">, active: false,
 score: 0.35795454545454547,
 conversation: #Ecto.Association.NotLoaded<association :conversation is not loaded>,
 conversation_id: 1621, id: 129158,
 inserted_at: #Ecto.DateTime<2017-03-07 21:32:19>,
 lost_against: %{"129129" => [51952, 51955, 51955, 51938, 51931, 51951, 51944], ...},
 message: #Ecto.Association.NotLoaded<association :message is not loaded>,
 message_id: 12748,
 text: "Yes because will she listen to them or the people.",
 updated_at: #Ecto.DateTime<2017-03-07 21:44:21>,
 user: #Ecto.Association.NotLoaded<association :user is not loaded>,
 user_id: 51959, vector: [],
 won_against: %{"129129" => [51946, 51934, 51934, 51942, 51954, 51957], ...}}

# module:
MessageEngine.User

# example state:
%DB.MessageUser{__meta__: #Ecto.Schema.Metadata<:loaded, "messages_users">,
 accepting_choices: false,
 all_choices: [%{"c" => 129138, "nc" => 129154}, ...],
 comparisons: [%{"a" => 129138, "b" => 129154},  ...],
 conversation: #Ecto.Association.NotLoaded<association :conversation is not loaded>,
 conversation_id: 1621, id: 132055,
 inferred_choices: [%{"c" => 129138, "nc" => 129154}, ...],
 manual_choices: [%{"c" => 129138, "nc" => 129130}, ...],
 message: #Ecto.Association.NotLoaded<association :message is not loaded>,
 message_id: 12748, rid: nil,
 user: #Ecto.Association.NotLoaded<association :user is not loaded>,
 user_id: 51959}

I don’t want to dig too deeply into why the states are what they are, but suffice to say that they have been well-researched and tested, and I don’t want to explain too much industry context

Now, we have been monitoring our app in production for a while, and noticed that, as the number of these processes alive increase, memory usage goes up almost exponentially.

With 600 MessageEngine.Users and 600 MessageEngine.Thoughts, we measured almost 35GB of RAM being used across the cluster.

I first tried to measure the amount of memory used just by the state of the process, but this doesn’t seem like nearly enough data to have that substantial of an impact.

I popped into observer to learn more, and ran the following tests:

30 users and 30 thoughts

With:
length(MessageEngine.User.all_choices) = 0
length(MessageEngine.User.manual_choices) = 0
length(MessageEngine.User.inferred_choices) = 0
length(MessageEngine.User.comparisons) = 0

One MessageEngine.User process was consuming 139kb of memory
One MessageEngine.Thought process was consuming 3kb of memory

With:
length(MessageEngine.User.all_choices) = 53
length(MessageEngine.User.manual_choices) = 20
length(MessageEngine.User.inferred_choices) = 33
length(MessageEngine.User.comparisons) = 53

One MessageEngine.User process was consuming 502kb of memory
One MessageEngine.Thought process was consuming 25kb of memory

300 users and 300 thoughts

With:
length(MessageEngine.User.all_choices) = 0
length(MessageEngine.User.manual_choices) = 0
length(MessageEngine.User.inferred_choices) = 0
length(MessageEngine.User.comparisons) = 0

One MessageEngine.User process was consuming 1089kb of memory
One MessageEngine.Thought process was consuming 6kb of memory

With:
length(MessageEngine.User.all_choices) = 53
length(MessageEngine.User.manual_choices) = 20
length(MessageEngine.User.inferred_choices) = 33
length(MessageEngine.User.comparisons) = 53

One MessageEngine.User process was consuming 4023kb of memory
One MessageEngine.Thought process was consuming 41kb of memory

So, as you can see, not only is memory usage per-process scaling up a lot just by adding ~50 maps to a list, the usage of each process also seems to be dependent on the number of processes alive! An order of magnitude increase in the number of processes results in an order of magnitude increase in the memory usage of each one.

This seems like really weird behavior to me, and I’m kinda stuck on where to go next, because, by my calculations, the memory usage of the state of these processes should be more like 20-50kb each (used this guide: http://erlang.org/doc/efficiency_guide/advanced.html#id68680).

Here’s a full dump of the state of a process that was using 4023kb of RAM: https://gist.github.com/pdilyard/92a04ccad39be87d05e466ed4dbea193

Any help would be greatly appreciated.

sikanhe · March 7, 2017, 10:29pm

Are you caching these users/thoughts inside your gen_server state? If so (someone correct me if wrong), due to immutability the reference to the old states/references are kept in memory. Which means, when you update the state, the maps are copied over and over and never garbage collected since gen_server is a living process.

The first way that comes to my mind to deal with this issue is to use ETS since it is a mutable structure.

NobbZ · March 7, 2017, 10:30pm

That’s the way bean works… If it has to increase size of a process heap it won’t shrink it until it really has to, eg because total memory wouldn’t be enough for all processes without shrinking some heaps.

Also there are many different ways to collect memory data of a process which did you use? If you really measured the total heap size it doesn’t say much since heap can shrink and grow.

pdilyard · March 7, 2017, 10:39pm

Hmm, interesting. I figured the unused old copies data structures would be garbage collected.

My use-case is kind of caching, but what really happens is that a user is loaded into memory in a process, then a whole bunch of changes occur to the structure in a 2-5min window, then the new state of the process is dumped back to the database.

The reason I’m not using ETS is because I need to distribute these across nodes in a cluster, and using Swarm as a process registry is a nice way to do that.

pdilyard · March 7, 2017, 10:40pm

I measured primarily using :observer.

NobbZ · March 7, 2017, 10:44pm

Which metric do we speak about?

imetallica · March 7, 2017, 10:51pm

@sikanhe you are somewhat correct. The process will only garbage collect when it exits/hibernates.

@pdilyard what you can do and I think it might work is: try to invoke on any of your callbacks (the mutating state one preferably) :erlang.garbage_collect/0. I think that might solve your problem on cost of some performance degradation.

http://erldocs.com/current/erts/erlang.html?i=2&search=garba#garbage_collect/0

pdilyard · March 7, 2017, 10:52pm

I used the “Memory and Garbage” section under “Process Information”.

And then also calculated what I thought the amount of memory usage should be (or there about) based on this guide: http://erlang.org/doc/efficiency_guide/advanced.html#id68680

pdilyard · March 7, 2017, 10:55pm

Shouldn’t the process be eventually garbage collected? I’ve let it sit for 10-20 minutes without any activity and the memory usage is still very high.

NobbZ · March 7, 2017, 11:18pm

There will be no GC if there is no reason to.

GC happens (simplified) only under 2 circumstances. Either stack and heap are colliding, so the current heap size will be doubled while still collecting garbage of the process. AFAIK this is the exact metric you are observing, just the amount of awailable heap for the process, used/filled or not.

The other reason why GC may kick in, is because another Process is OOM and BEAM tries to get more memory from other processes by collecting and shrinking them.

This article about GC in OTP 19 explains it pretty good and in all the detail you might or might not need.

brightball · March 7, 2017, 11:20pm

I’d agree with the others here. If you are going to update an in memory data structure, you are better off doing it in ETS if it’s getting updated pretty constantly. If it’s not, then you might be better off with mnesia to distribute it across the cluster. IMO unless the updates are happening almost constantly over the course of 2-5 minutes you are going to be better off persisting it and then retrieving it when the updates come.

pdilyard · March 7, 2017, 11:39pm

Updating on disk had already proven to be too slow (each request requires a
pool of other related data, and an update to 3+ “objects”, plus there are
background jobs going on constantly to re-calculate scores and things). I
did a lot of benchmarking to get to the genserver implementation…The app
is really a lot like a game.

Maybe the approach I should go for is to use genservers to route requests
to the proper node, but use ETS to actually maintain the state of each
object. Any thoughts on that?

dom · March 8, 2017, 12:51am

Can you get recon from Hex, then try :recon.bin_leak(10) (http://ferd.github.io/recon/recon.html#bin_leak-1) when your system is using a lot of memory, and check 1) does usage go down a lot? and 2) what type of process shows up in the top list? (User vs Thought vs something else)

This will help pinpoint if GC really is the issue and where. Another possible problem would be if you receive large binary messages, then extract and keep in the state a small substring, which is really a pointer into the larger string and prevents it from being garbage collected.

You might find chapter 7 of Erlang in Anger useful (http://www.erlang-in-anger.com/).

pdilyard · March 8, 2017, 1:26am

Yes, usage went down a ton after running that function actually (~1.2gb to 120mb on my machine). Can you expand a little bit on what this does and what it means (and how I might be able to fix my problem using these results)?

The 10 processes listed were not my User/Thought processes, they where Phoenix.Endpoint.CodeReloader, :ssl_manager, Logger, and a few other :gen_servers that didn’t have obvious names.

pdilyard · March 8, 2017, 1:34am

Just for fun, I also called :recon.bin_leak(50), and the list was full of mostly values like this:

{#PID<0.22905.0>, -117,
  [current_function: {:gen_server, :loop, 6},
   initial_call: {:proc_lib, :init_p, 5}]},

I ran GenServer.call(pid(0, 23540, 0), :get), and can confirm that these are my User processes.

imetallica · March 8, 2017, 2:30am

Which function have you used to reduce memory?

pdilyard · March 8, 2017, 2:32am

:recon.bin_leak/1 as suggested by @dom

dom · March 8, 2017, 9:19am

bin_leaks forces a garbage collect on all processes, and measures how many reference-counted binaries were freed per process. So this confirms lack of GC is the issue here.

Some things you can do:

If you have operations that generate lots of refc binary garbage, do them in a separate, short-lived process linked to your long-lived user process, so it doesn’t accumulate garbage.
You can use a timer to hibernate (see genserver doc) the user process after N seconds of inactivity, or when you know it won’t be getting messages for a while. The process will still be alive, but won’t hold extra memory.
You can also use a timer to force a gc every N seconds.
ETS as mentioned can help. Each process can own a table, it doesn’t have to be shared. This is a nice article about the difference it makes: http://theerlangelist.com/article/reducing_maximum_latency

DianaOlympos · March 8, 2017, 9:33am

Reduce the amount of memory you allocate to the BEAM. Why try to force some GC if you do not need it ?

aseigo · March 8, 2017, 10:50am

If this is due to binaries being held due to being referenced, as opposed to just GC not running on those processes:

If you can identify which binaries are being kept around (sounds like messages from the user that are being parsed out, with the interesting components stored in those maps?), then consider using :binary.copy/1 on the binary snippets before storing them in your GenServer’s state. This will create a deep copy of those binaries, freeing the original binary they were pulled from, and then the GC can do its job on those original binaries.