Trouble understanding octet string (binary) iteration

The Comprehensions article says:

In Elixir, it is common to loop over an Enumerable … Comprehensions are syntactic sugar for such constructs

However, this simple example of Enumerable manipulation:

# [11, 22, 33]
[1, 2, 3] |> i -> i+10*i end)

fails when the input is instead an octet string:

# (Protocol.UndefinedError) protocol Enumerable not implemented for <<1, 2, 3>> of type BitString.
<<1, 2, 3::8>> |> i -> i+10*i end)

Instead it only seems to work with the allegedly “syntactic sugar” for comprehension:

# [11, 22, 33]
<<1, 2, 3>> |> (fn s -> (for <<i::8 <- s>>, do: i+10*i) end).()

What am I doing wrong?

What do I need to do to Enumerate a BitString by chunks of a given interval, especially 8?

EDIT [X-Y Problem]:

I’m trying to do this as a component of a larger function which counts the “leading zeroes” of an octet string in a rather perverse way — the octets are counted in something like big-endian*, while the bits within each octet are counted little-endian. So I’d require <<0, 255, 0, 0>> -> 8 but <<0, 254, 0, 0>> -> 9.

*That is, in the “usual” iteration order one gets from Python’s for octet in buf or C’s for ( i=0 ; i<buflen ; i++ ){ octet = buf[i]; }.

The bit/binary syntax usable with for is a feature of for with it being a special form. Binaries/bitstrings do not implement the Enumerable protocol. The issue is that there’s no clear cadance as to where a binary or bitstring should be split at. for knowing what syntax is on the matching side can adjust that automatically. With Enumerable you however don’t know anything about the consumer of the individual pieces. You therefore need to manually split the binary or bitstring up.

I see. So it’s not only syntactic sugar; in certain cases it actually does new things?

Is there another way to get at the enumeration-of-bitstrings? (Of course, I understand I’ll need to specify the desired element “width” in any case.)

The function I’m trying to build will need early-exit; I want to iterate through the octet string only until I find a non-0 octet, then return. Is there any “lazy” way to go iterate through an octet-string — that is, without first “greedily” slurping the entire string up into a List like for seems to do?

Hmm, I think I figured it out, now.

def binary_to_stream(s) do
  Stream.unfold(s, fn
    <<>> -> nil
    <<next::8, rem::binary>> -> {next, rem}
1 Like

You can use Stream.unfold to turn a binary into an enumerable of “pieces”:

<<1, 2, 3>>
|> Stream.unfold(fn 
    <<i::8, r::binary>> -> {i, r}
    <<>> -> nil
|> Enum.into([])
# [1, 2, 3]

You can, but it’s worth noting that for is going to produce more optimal code than wrapping this in a stream will do.

If the early exit really makes a difference, you could also do explicit recursion.

1 Like