Elixir memory model question

nikiiv · June 19, 2024, 12:36pm

I am trying to get a list of gotchas for my prototype of an OMS for fun.
Given that Elixir (probably more of a BEAM question of course) is immutable, how does this exactly play within one process.
Lets say I have a struct for Customer and Product. Each one takes approx 10KB of memory
Then we have a struct for Order, that contains both keys for Customer and Product. Lets say with those two properties/fields set to nil an Order would occupy 1KB.
Say we have 1000 Customers and 10000 Products
If we combine each Customer and each Product into an Order, so 1_000_000 orders what is the expectation of the memory used within one process
1M by 21K each or 1M with 1K each plust two references to the 2x1000x10K (customers and products weight before combined into orders).

Given immutability I expect that Order would have reference and only if we mutate a field in the nested structure, we get a new memory allocation.

I expect atoms will be re-used, but lets for the sake of the question assume that no atoms are used in any of the 3 structs.

Technically an OMS (Order Management System) uses a lot of props that are static during the business day and data (say accounts) is changed after order execution which is a multi-step process, so it is important to know what to expect, when some of the props related to orders are re-used among multiple orders.
I can go ETS but… do I really have to?
Any good blog posts or reads on the topic?

Thanks in advance

NobbZ · June 19, 2024, 12:49pm

Struct keys are atoms, so your assumption doesn’t hold.

Maps smaller 30ish entries are (internally) implemented as a tuple with a list of keys and a list of values, and some tree for bigger ones. Both variants allow you to share parts of the unchanged data.

So if you have a struct of size X, and “change” only one keys value, then you will have to copy the structs keys and value “pointers” and set the one changed value pointer to point at the new value. Allowing you to reuse all the actual keys and unchanged values.

nikiiv · June 19, 2024, 1:19pm

It’s not the keys, it is the value of those keys that I am assuming aren’t atoms and the assumption is for the sake of understanding where data will be shared until modified (IOW copy on write). Can’t change keys in a struct, just the value

benwilson512 · June 19, 2024, 1:37pm

Why guess when you can measure!

Check out :erts_debug.size to get the size (I think in words) of a given term.

NobbZ · June 19, 2024, 1:59pm

That will show (roughly) the same values for the original and the “mutated” structs, as it does not take into account that there is shared data.

nikiiv · June 19, 2024, 2:14pm

    {:ok, file} = File.read("/Users/...../memory_playground/java_error_in_idea_44801.log")
    z = %{content: file}
    l = []
    l = for _cnt <- 1..1_000_000, do: [z |[l]]
    IO.puts(:erts_debug.size(l))

It takes 10 min and it is still computing the word size…
Is it possible that it counts something that has already been accounted for if it is traversing the structure

nikiiv · June 19, 2024, 2:30pm

Reading a 4MB pdf (Adopting Elixir.pdf) and adding it in a map with 1 key and then adding that map 100 times to a list
The size of the file is 3.8MB, after that using :erlang.memory(:total) ended up eating 6MB of memory… An overhead of 2MB for 100 copies
So I guess until a change is made, the values are passed by reference…

Still can use some good blog posts about the topic…

LostKobrakai · June 19, 2024, 2:32pm

How much of your data is binary data >64 bytes? Such binaries are put on a shared heap and reference counted instead of having multiple copies around.

For everything else Memory Usage — Erlang System Documentation v27.0 and :erts_debug.size are your friend.

rvnash · June 20, 2024, 4:36pm

My first question is what you mean by “reference”. There are no references in Elixir. There are only the data themselves.

Secondly, when a data structure, like a map here is modified, the underlying BEAM does not make a complete copy of the whole structure, although it appears to at the Elixir level abstraction. It’s actually some pretty clever data structure manipulation that can maintain mostly a limited memory impact.

I’ve found some clues to how this all works here. But to be honest, at the programming abstraction, I don’t know how to predict how it will perform.