Bitstring and binary

kostonstyle · November 12, 2016, 1:57pm

Hi all

Could someone please explain me what is a bitstring?

And the difference between bitstring and binary.

Thanks

minhajuddin · November 12, 2016, 2:10pm

A bitstring is a type that stores arbitrary number of bits, you can have a 5bit bitstring whereas binary stores arbitrary number of bytes

Here is some code that should make things clearer:

# bitstring
bs = << 3 :: size(2) >>      # => 2 bits 11
IO.inspect bs                # => <<3::size(2)>>
IO.inspect is_bitstring(bs)  # => true
IO.inspect is_binary(bs)     # => false

# binary
bin = << 3 >>                # => 8 bits or 1 byte
IO.inspect bin               # => <<3>>
IO.inspect is_bitstring(bin) # => true
IO.inspect is_binary(bin)    # => true

A binary is just a collection of bytes, so it has to have a number of bits that is divisible by 8 (i.e. a byte). So you can have a 8 bit binary, 16 bit binary and so on. If your binary is not divisible by 8, e.g. 7bits, 15bits, 14 bits, 23bits, you have a bitstring. And since a bitstring can have any number of bits even a binary is a bitstring. However, the inverse is not true.

jaysoifer · November 12, 2016, 2:24pm

If you want to have a deeper understanding of the subject and are willing to invest half an hour to appreciate it, I strongly recommend the following video:

ElixirConf 2016 - String Theory by Nathan Long & James Edward Gray II

Really useful, really interesting, loved every minute of it.

nathanl · November 14, 2016, 1:30pm

Thanks @jaysoifer! I also wrote some blog posts covering some of the talk topics in a little more depth:

@kostonstyle The way I phrased this distinction was:

In Elixir, a “bitstring” is anything between << and >> markers, and it contains a contiguous series of bits in memory. If there happen to be 8 of those bits, or 16, or any other number divisible by 8, we call that bitstring a “binary” - a series of bytes. And if those bytes are valid UTF-8, we call that binary a “string”.

So a subset of bitstrings are binaries, and a subset of binaries are strings. Like this:

If you don’t understand what it means for something to be “UTF-8 encoded”, the first blog post should help.

kostonstyle · November 15, 2016, 10:33am

Consider an example:

a = << 3 >>

Is a bitstring or binary, that contains number 3?

NobbZ · November 15, 2016, 10:45am

Since you do not specify a size for that given element it is assumed to be 1 byte. So the variable a does hold a bitstring of length 8 or a binary of length 1, and should be even a string (while not printable, it does only contain valid codepoints)

Remember the picture from above, every string is a binary and every binary is a bitstring, but not necessarily the other way round.

A bitstring is a binary if and only if it has a number of bits that is evenly devisiable by 8.
A binary is a string if and only if it does only contain valid unicode codepoints encoded in UTF-8.

So as you can see binary and string are true subsets of binary and string is a true subset of bitstring and binary.

minhajuddin · November 15, 2016, 10:45am

It is both, a quick test shows

a = << 3 >>
IO.puts is_bitstring(a) # => true
IO.puts is_binary(a) # => true

When you write << 3 >>, there is an implicit ::size(8) modifier added. So, << 3 >> is the same as << 3 :: size(8) >> and since the number of bits is divisible by 8, it is a binary and a bitstring