I’ve been reading this article: How Elixir Lays Out Your Data in Memory . From what I understand, :erts_debug.size/1 can be used to see how many “words” of memory that a given term takes up. For example:
I haven’t (yet) searched in code, but I see 2 possible explanations:
there was "abc" string somewhere in shared heap
small string optimisation
The latter is quite interesting beast. It works due to fact, that on 64-bit machine (most of used nowadays) when you store a pointer, then some of the top bits will not be ever set. This mean, that if these are set, then it will 100% not point to valid memory. That mean that we can do hack like storing tag in few top bits, and data in rest. For example if binary is shorter than 7 bytes, then it will be stored directly inside 64-bit pointer. I assume that if such optimisation is used, then it may be, that it is not used for bitstring, and only for “full” binaries.
I think the struct that represents a binary on the heap can only have a size in bytes, whereas sub-binary references can also have bit size/offset, so <<1::1>> is stored as a binary + a sub-binary. See the branch here :
This is really interesting. I guess the lesson here is to remember that binaries/bitstrings are first and foremost a way to work with bytes. Sure, you can operate on bits too, but that doesn’t make the memory allocated any less (as we can see it does the opposite ) Is this a correct way to think about it?
Not precisely. That extra overhead for working with bitstrings is fixed and never goes beyond what you saw. It’s just that there’s no way to just store only 3 bits somewhere and have that only take a byte at most. You still need some management metadata for such a data structure but that metadata is very small and quickly amortized if you manage only several bytes worth of bit data.