Why does a bitstring use more words than a binary

I’ve been reading this article: How Elixir Lays Out Your Data in Memory . From what I understand, :erts_debug.size/1 can be used to see how many “words” of memory that a given term takes up. For example:

iex(3)> :erts_debug.size(:a)
0
iex(4)> :erts_debug.size("abc")
6
iex(5)> :erts_debug.size([1])
2
iex(6)> :erts_debug.size([1, 2])
4
iex(7)> :erts_debug.size([])
0
iex(8)> :erts_debug.size(1)
0
iex(9)> :erts_debug.size(1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
12
iex(10)> :erts_debug.size(%{"a" => 1})
12

But what I can’t figure out is why a single bit uses 11 words, while a byte uses only 6:

iex(1)> :erts_debug.size(<<1::1>>)
11
iex(2)> :erts_debug.size(<<97>>)
6

Could someone explain why this is so?

1 Like

I haven’t (yet) searched in code, but I see 2 possible explanations:

  • there was "abc" string somewhere in shared heap
  • small string optimisation

The latter is quite interesting beast. It works due to fact, that on 64-bit machine (most of used nowadays) when you store a pointer, then some of the top bits will not be ever set. This mean, that if these are set, then it will 100% not point to valid memory. That mean that we can do hack like storing tag in few top bits, and data in rest. For example if binary is shorter than 7 bytes, then it will be stored directly inside 64-bit pointer. I assume that if such optimisation is used, then it may be, that it is not used for bitstring, and only for “full” binaries.

2 Likes

I think the struct that represents a binary on the heap can only have a size in bytes, whereas sub-binary references can also have bit size/offset, so <<1::1>> is stored as a binary + a sub-binary. See the branch here :

And the struct declarations:

5 Likes

This is really interesting. I guess the lesson here is to remember that binaries/bitstrings are first and foremost a way to work with bytes. Sure, you can operate on bits too, but that doesn’t make the memory allocated any less (as we can see it does the opposite :laughing: ) Is this a correct way to think about it?

1 Like

Not precisely. That extra overhead for working with bitstrings is fixed and never goes beyond what you saw. It’s just that there’s no way to just store only 3 bits somewhere and have that only take a byte at most. You still need some management metadata for such a data structure but that metadata is very small and quickly amortized if you manage only several bytes worth of bit data.

3 Likes