Scanning a bitstring for a value

I asked the following on Freenode’s elixir-lang but quickly got buried in connect/DC messages :wink: Hopefully someone here can help.

(09:38:30 PM) Cheezmeister1:
Hey all, having trouble searching a bitstring for a value. I’m trying
to read the WXXX ID3 frame
(http://id3.org/id3v2.3.0#User_defined_URL_link_frame). There’s a
“Description” string of unknown length followed by a null char–both
potentially UTF-16. This seems quite simple but without traditional
iteration, I’m surprisingly lost.
(09:38:42 PM) Cheezmeister1: I only really need the content following the null char.
(09:41:07 PM) Cheezmeister1: Related: Enum.reduce doesn’t want to operate on a bitstring, but I can use them in comprehensions. Why is that?

If I can provide any further info on what I’m trying to accomplish, just ask.

It seems like one good way to go about this is to pattern match on the parts that you know the size for, and then scan the parts that you don’t. Demonstrated are two different recursive ways to do the mapping, with parse_description and parse_link, depending on whether what you’re parsing is 0-terminated or not. It’s intentionally verbose so you can see different ways of matching.

The spec you linked to isn’t super clear, but hopefully this gets you closer.

defmodule ID3 do
  def url_link(<<"W", code::bytes-size(3), encoding::bytes-size(1), rest::binary>>) do
    IO.inspect {code, encoding, rest}

    {description, rest} = parse_description(rest, [])
    link = parse_link(rest, [])

    {code, encoding, description, link}
  end

  # 0 terminated
  defp parse_description(<<0::utf16, rest::binary>>, acc) do
    {Enum.reverse(acc), rest}
  end

  defp parse_description(<<c::utf16, rest::binary>>, acc) do
    parse_description(rest, [c | acc])
  end

  # end of binary terminated
  defp parse_link("", acc), do: acc

  defp parse_link(<<c::utf16, rest::binary>>, acc) do
    [c | parse_link(rest, acc)]
  end
end

You can test it like so:

iex> tag = <<"WOAR", 16, "This is my description"::utf16, 0::utf16, "https://www.google.com"::utf16>>
iex> ID3.url_link(tag)
{"WOAR", 'This is my description', 'https://www.google.com'}

If it’s not safe to assume that the encoding of the description and URL is always UTF16, then you can create other functions or manually do the byte size matching yourself.

Thanks @brainbag, I think this is what I need. Close to my evolving solution, with your accum/reverse as the secret ingredient.

Yeah the ID3 spec can be a bit hand-wavey, but believe me, I’ve read worse :wink:

1 Like

Great! I’m glad that helped. I haven’t done this a lot in Elixir, but I’ve done a ton of work with data processing in the games industry - networking code, sound systems, custom file systems, etc. There were so many problems I could have solved much easier if I’d had Elixir/BEAM.