Agent keeps large binary memory despite no state

Hello,

I’m having some trouble optimizing the memory usage for my application. Its job is manage and collect data from a bunch of networked devices (typical IOT application).

I create a process for each managed device (via DynamicSupervisor) that handles communication and data retrieval. I send the data (a map) to an Agent process that accumulates the data in a giant map of device address => device data. Once every few seconds, another process grabs this accumulated data (if present), encodes it with JSON.encode!/1 and publishes it on MQTT. When the data retrieval happens, the Agent resets its internal state to an empty map via get_and_update.

This works fine, but for whatever reason the agent accumulates binary memory over time. :recon.proc_count(:memory, 10) shows its memory constantly increasing, even though the data it collects should be temporary. I can call :recon.bin_leak/1 to clear the memory out, but I’d rather find out what’s causing the constant memory increases. For reference, the number of devices is less than 1000 and the data accumulated for each device can be enough to push the outgoing encoded message to ~1MB.

What might I be doing incorrectly here? I’d rather not run garbage collection manually, but I can do so if necessary. Thanks!

1 Like

This is a persistent problem that periodically bites people. I can’t be of much help except give you a drive-by comment out of memory: try hibernating the process periodically and see if that helps? I seem to remember people saying that this helped.

Yeah, I did see this recommendation on previous related discussions, but control messages come in every 20 seconds, so the caching Agent is constantly getting filled/flushed. I don’t know if hibernation would really help here; maybe there’s some other process that’s holding the memory and causing the issue?

Yes, very likely. Large binaries are a consistent headache in the BEAM.

Another drive-by comment: tried using :binary.copy and never storing an original big binary? Admittedly I am going off the rails here because I don’t quite know your code and needs and it seems that this suggestion kind of goes against them.

Looking around some more: Erlang, Binaries, and Garbage Collection (Sigh)

It turns out that refc-binaries keep track of every process that has ever touched them!

I know, its pretty obvious in retrospect, but the point here is that a refc-binary is not clobbered till every process that has ever touched it has been garbage-collected.

So that means whenever I want to ease binary memory in my app I’d need wait until all of these are garbage collected:

  • the agent
  • the controlling process
  • each managed device process
  • any other process that routes the binaries at all

…or just call :erlang.garbage_collect/0 myself periodically. I already have a timer in production that goes off periodically to check the number of open file descriptors for the application, I can probably make the change to call eval “:erlang.garbage_collect”, as most of the processes that touch the binaries are intended to be long-lived

Alternatively, and if your architecture allows it: have the processes that work with the big binaries be throwaway and only send small messages to the long-lived processes?

I only pulled that off once though and it was fairly troublesome to do. Or again, just hibernate the process(es) once every 100 messages or so?

So nice to see a topic with the word “Agent” in it that is actually referring to the OG :grin:

I know it’s not actually the OG, I’m speaking contextually, jeez

1 Like

But what do you do while your agent is garbage collecting ?

More seriously, :observer is useful to me in those binary leak cases.

1 Like

Yeah, I’m using :observer and :recon to help figure out where the memory is growing. It’s definitely process heap memory that’s expanding due to large refc binaries I was passing around like it was free. Lesson learned, I guess.

I’ll see if I can move most of the stuff dealing with large binaries into short-lived/temporary processes that I spawn when I get a new control message. For the longer-running device processes that interact with the large binaries, I’ll try manually garbage collecting those after sending the data along to ETS tables (instead of an Agent for distributed writes), then delete/shutdown the temporary stuff once all data has been processed in a given cycle.

Will try this tomorrow and see how it goes.

1 Like

First, why is that a problem? System does not guarantee that it triggers a garbage collection right after responding to a control message. This data will be eventually garbage collected and system decides when it happens. You won’t run out of memory

Second, have you tried using ets table? You can just clear the table while reading from it with select_delete

refc binary means that there is actually only one global instance of a binary kept in memory and all these processes are just referencing it. It is an optimization.

The other problem is sub-binaries. It is a parsing optimization, when we want to extract a part of binary in a new binary, runtime actually just create a reference inside this bigger binary, thus keeping the bigger binary in memory. This is solved by :binary.copy and it is true only if you extract your machine information from large strings with some specific operations, but again, using :binary.copy is an optimization which makes garbage collection a bit more efficient

can probably make the change to call eval “:erlang.garbage_collect”

Garbage collection in BEAM is local to the process. Calling :erlang.garbage_collect clears the memory only in the calling process

1 Like

People have already mentioned most of what I wanted to recommend, so I’ll just list them again with a bit of explanation.

The Erlang VM uses reductions to schedule execution and trigger GC, so the best approach is to write small functions and avoid creating overly large variables (a thousand reductions can lead to high memory usage).

Hibernate processes (to reduce memory usage) when they need to wait a long time for new messages.

Manually trigger GC after function calls that may produce a lot of memory.

Push state to ETS tables (with the :compressed option), then terminate the process.

Use streams for processing when dealing with potentially large memory usage (e.g., big lists).