Erlang GC gotchas

I’ve been reading up on reactive programming and ran into this comment [0] on HN.

The poster wrote that he’d had some serious problems with Erlang’s GC.

I spent a total of 11 months consulting with a company that built a large (50kloc) financial system in erlang. They had terrible performance problems that were caused entirely by erlang.

Imagine you have a large amount of data (order books, accounts, etc). You could put it all in on erlang process, but the gc does not cope well with large heaps (multi-second pauses). You could store the data outside the heap (eg ets) but then you pay the cost of copying on every access and have to tradeoff ease of use (more data per key) vs performance (less data per key). You could split the data up into many processes and then all your simple calculations become asynchronous protocols. Have fun debugging the math or rolling back changes on errors.

I went into that contract with a fondness for erlang. Now I wouldn’t touch it ever again. A naive single-threaded blocking server achieved 10x less code, 40x better throughput and 100x better latency. I used clojure, but any sane platform would have worked just as well with that design.

I usually hear about how good Erlang’s way of doing garbage collection (per process) is, but it seems it’s not particularly suitable in the case when there are “large numbers of small objects eg thousands of orders per market.”

I wonder if anyone anyone else had similar problems with it and how they solved it?

[0] I spent a total of 11 months consulting with a company that built a large (50klo... | Hacker News

So he wrote a large finacial system in 5kloc of Clojure :slight_smile:

That depends on what version of Erlang they used. OTP 19 introduced a lot of improvements to the GC and 20 added some additional niceties to reduce the GC work.

EDIT: given the HN comment is from almost 3 years ago, I would expect they were speaking most probably of, now ancient, OTP R16B.

1 Like

As you can read how Erlang GC works in the following article, This issue is mostly caused by BAD CODE and Not bad GC. (I think usually these are mistakes made by people from OO languages whom switched to Erlang and not learned enough about Erlang style code e.g: Excessive and unnecessary user of runtime added GenServers or Creating too many infinity running processes)

You can learn more about Erlang GC here: https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html

I have a working reverse proxy written in Elixir and it has been in production for about 7 months. It processes 70~200 GB of binary each day (Parses them using Binary pattern matching). After long time of uptime, it corrently only reserved about 400mb of ram. Coolest part is memory usage wont vary under load. (To me that means the process allocated memory is free and usable, just reserved!)

2 Likes