Elixir processes and no shared heap memory

vals · October 14, 2017, 6:08pm

Elixir processes have their own heap. If a process wants to share a data structure with another process, how could that be possible? One answer that comes to my mind is that the process sends a message to the other process containing the data structure. Does that mean that the entire data structure is copied from one heap to the other? And if this is true, isn’t it inefficient?

NobbZ · October 14, 2017, 6:09pm

Yes and yes, but it gives you guarantees and thus you can do a per process GC instead of stop the world GC.

benwilson512 · October 14, 2017, 9:10pm

There are trade offs either way. Disadvantages of having individual heaps are, as you note, that you have to copy.

Advantages however are numerous. When a process exits all of its heap can be trivially cleaned up without jeopardizing the heap of any other process. This is both useful (and fast) in intentional exits, and even more useful when it comes to resiliency in the face of unexpected errors.

As mentioned, isolated heaps mean that each process can be GCed independently. Most processes end up with very small heaps, and often these can fit entirely within the CPU cache so that while that process is being worked on all of its memory is readily available and very fast to access.

yurko · October 15, 2017, 7:57pm

If a process wants to share a data structure with another process, how could that be possible?

There are cases when sharing data between processes is exactly what you need to do, you can use ETS Erlang -- ets to achieve that.

NobbZ · October 15, 2017, 9:24pm

Reading and writing in and out of ETS still copy the hole datastructures involved.

kylethebaker · October 16, 2017, 3:27am

Just to add a little bit to what’s already been said: the cost of copying is a very transparent cost as opposed to a more subtle cost. So although it may be more expensive you at least know up front that the cost exists which makes it easier to acknowledge and reason about. You might be able to identify places where you are sending large data structures unnecessarily when only a small piece is needed by the receiving process and be able to work around this in an approachable way, whereas dealing with costs that arise from garbage collecting a massive heap can be a lot trickier to deal with.

yurko · October 16, 2017, 7:03am

Reading and writing in and out of ETS still copy the hole datastructures involved.

That would depend, if the structure is “structured enough” to have uniform parts under some keys then these parts can be written and read separately, possibly by different processes.

NobbZ · October 16, 2017, 7:52am

I can’t imagine how this should work, since A reads the data, B alters it, then A starts to process the data, or was even halfway through before B altered it… How is this supposed to work without copying?

Anyway, regardless of some optimisations here, that may only occur under special conditions, it is safe to assume “always copy”. Under this assumption there will be no bad surprises when starting to communicate with processes on other nodes.

yurko · October 16, 2017, 10:09am

I can’t imagine how this should work, since A reads the data, B alters it, then A starts to process the data, or was even halfway through before B altered it… How is this supposed to work without copying?

If the data structure is for example a list of structs, each of those can have a separate entry in an ets table and dealt with separately (only needed pieces of data are actually copied around). The whole thing can be adjusted by setting read / write concurrency, if needed serializing access via a single process etc.

Here’s a short read on the topic if you’re interested Yariv's Blog: Erlang does have shared memory

NobbZ · October 16, 2017, 10:57am

Thanks for the read, I hope I’ll get some time to read through.

OvermindDL1 · October 18, 2017, 10:27pm

For note, you can have a shared structure with no copying, but it has to be static at compile-time, you just bake it into the module source, like say into a function directly or so. There’s even a couple of libraries that can do that compilation at runtime easily so it is like a slow update but super-fast acquire store.

idi527 · October 19, 2017, 4:37pm

But the objects still get copied to and from ets, don’t they?

From the article

Objects are copied when inserted into and looked-up from ets tables.

OvermindDL1 · October 19, 2017, 4:45pm

Yes they do.

yurko · October 19, 2017, 5:26pm

Yes. If your data structure consists of many objects you can use ets to only copy the ones that you need, that would be the most of the cases with really big data structures.

In case it’s something that only translates to an ets table with one entry, the whole thing would not make much sense.