Why is this reduced Stream consuming so much memory?

Hi there!

When I am trying to build my Commanded aggregate from events, it crashes my Render.com instance saying Out of memory (used over 2Gi).

I am building the aggregate from 100_000 events. That’s a lot, but nothing enormous I hope.
Every event returns something between 150-200 when executing :erts_debug.size(event).
The final (built) aggregate returns around 1_400_000 when calling :erts_debug.size(aggregate). It starts almost from zero and each event increases its size a bit.

EventStore is backed by PostgreSQL and the event data is stored as jsonb.

On my MacBook with this configuration it takes around 20 seconds.

Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.10.2
Erlang 22.2.8

That seems reasonable, considering the number of events. I am just surprised, that this could consume so much memory. I though that the garbage collector is somehow getting rid of the already consumed events and already discarded versions of aggregate. Isn’t that the case?

This is the part responsible for building the aggregate. I assume it’s correct, I link it here just in case.

I am really curious to know why it is so memory hungry and what exactly is causing this. Any tips how to profile this? Or is it event worth it if that memory consumption is just a fact I have to deal with?

EDIT: Benchee says it consumes around 13GB when benchmarking with parameter memory_time: 10.

Thanks a lot.

Hi there, this reminds me of what can happen when a routing process handles large amounts of messages with reference counted strings in them (>64B) but makes very few (if any) allocations on its own. A strategy to deal with such problems is to force garbage collection every n number of messages, so that the references are removed and the strings can be gc’ed.

So… what if you fetched the events in batches, and forced a gc on the process that does the aggregation before moving on with the next batch? I’m not sure how easy is this to do with commanded, but perhaps it’s worth a try.

1 Like

OK, I did some benchmarking and created a PR which should drasticaly improve the consumption. There are more details in its description here
I am really curious why the big difference in the memory consumption.
Any ideas someone?

1 Like