Why is data deep-copied when sending to another process?

I think one thing to realise wrt gc is that if you share data between processes then when you do a gc you have to gc all the processes and the whole heap. This means that you need a real-time collector which complicates things. Running in multiple threads which the BEAM does complicates matters even more as you need to make it thread safe and/or pay a large cost in synchronisation. This is now done for the large binaries which results that sometimes it can take long time to reclaim their memory. Basically every process which has referenced the binary has to do a full gc before the binary can be reclaimed. This can take a comparatively long time and overflowing memory with unreclaimed large binaries is actually a problem.

So while not sharing data and copying data when sending messages is actually quite a good way of doing it even if it sounds wrong. Way back when I did some implementations of erlang with shared memory and real-time collectors and it is not trivial to get it right. Fun though, but not trivial.

In 22 there are some new special memory areas which are shared so some things are good and really fast but you also pay a heavy price. Check out atomics, counters and persistent_terms and you will see what I mean.

18 Likes