When do IOLists dedupe strings into references?

when does deduplication happen for IOLists…if I create a massive IOList with a bunch of macros expanding inside of each other, will there be a final pass that deduplicates binary segments?

e.g.

1. EExFun.outter() #=> [
  "<div>", 
  EExFun.inner() #=> ["<div> some string </div>"],
  "</div>"
]
  
2. ["<div>", "<div> some string </div>", "</div>"]

3. [ref_div_open(), [ref_div_open(), "some string", ref_div_close()], ref_div_close()] 

If I’m understanding your question correctly then what you ask for is already happening internally in the runtime: repeated strings are a part of a reference-counted pool.

What do you mean by dedupping?

Having 2 immediate values that represent the same string might be optimised into a single immediate value by the compiler, this is considered an optimisation and the observable effect of the program stays the same whether or not those values get optimised into one or two immediates.

At runtime, any binary value though, that is bigger than a certain threshold gets to the binary heap. It’s recounted there. But just because you create some string somewhere else that happens to have the exact same value, it won’t count to the old ref counted string, it would create a second copy of it.

So, there is no magic deduplication as I would understand it, and also I think that is fine, as I would dare the runtime implications of comparing all existing strings on the binheap when we create/build a new string/binary.

If you want to poke around with how the vm will make use of your strings. There’s a tool linked from this article that may be of use: http://www.evanmiller.org/elixir-ram-and-the-template-of-doom.html

1 Like

Thanks, that’s actually the article that got me interested in how it was happening.

Gotcha, the ref counted pool gives me something to google as I’m trying to understand the specifics of when it happens. Thanks!

yeah, unfortunately I dunno if the linux version for dtrace is ready yet? Maybe it is I should probably check, but linux does have bpftrace, I’m working on setting that up now

Okay, that’s really good to know and clears up a misconception I had, thanks