Hi, I’m new with Elixir and for learning I try to read id3v2 tags from mp3 files. The stdlib for working with binary seems awesome with pattern matching.
I have question about how to parse some data (title, artist, etc)
By following the spec it seems to have a flag with encodage information but I don’t know how to read it. I think I have utf16 for the title (TIT2) because I have some bytes to 0.
My current code:
file = File.stream!("./lib/file.mp3", [:read, :binary], 128)
id3tag_bytes = file |> Enum.take(1) |> hd
<<header::binary-size(10), rest::binary>> = id3tag_bytes
<<"ID3", major::binary-size(1), revision::binary-size(1), flags::binary-size(1),
size::binary-size(4)>> =
header
case flags do
<<0>> -> IO.puts("No extended header")
_ -> IO.puts("Extended header")
end
<<frame_overview::binary-size(10), frame::binary>> = rest
<<frame_id::binary-size(4), frame_size::binary-size(4), frame_flags::binary-size(2)>> =
frame_overview
<<frame_size_int::size(4)-unit(8)>> = frame_size
IO.inspect(frame_size_int, label: "Frame Size")
<<title::binary-size(frame_size_int - 10), _rest::binary>> = frame
# id3 = <<header::binary-size(3), _v::binary-size(1), _flags::binary-size(1), _size::binary-size(4)>>
IO.inspect(header, label: "Header")
IO.inspect(:binary.decode_unsigned(major), label: "Version")
IO.inspect(:binary.decode_unsigned(revision), label: "Revision")
IO.inspect(:binary.decode_unsigned(flags), label: "Flags")
IO.inspect(frame_id, label: "Frame ID")
IO.inspect(:binary.decode_unsigned(frame_size), label: "Frame Size")
IO.inspect(size, label: "Size")
IO.inspect(frame_overview, label: "Frame Overview")
IO.inspect(title, label: "Title")
IO.inspect(title |> :unicode.characters_to_binary(:utf16, :utf8), label: "Title")
You can find the first 100 bytes from the file: id3tag_bytes
<<73, 68, 51, 3, 0, 0, 0, 4, 109, 110, 84, 73, 84, 50, 0, 0, 0, 19, 0, 0, 1,
255, 254, 76, 0, 97, 0, 32, 0, 81, 0, 117, 0, 234, 0, 116, 0, 101, 0, 84, 80,
69, 49, 0, 0, 0, 17, 0, 0, 1, 255, 254, 79, 0, 114, 0, 101, 0, 108, 0, 115, 0,
97, 0, 110, 0, 84, 65, 76, 66, 0, 0, 0, 57, 0, 0, 1, 255, 254, 67, 0, 105, 0,
118, 0, 105, 0, 108, 0, 105, 0, 115, 0, 97, 0, 116, 0, 105, 0, 111>>
The title in TIT2 frame should be “La Quête” which can be found here: <<76, 0, 97, 0, 32, 0, 81, 0, 117, 0, 234, 0, 116, 0, 101>>
but like I said previously there are some bytes to 0 so I guess it’s encoded in utf16 and the tag is readable without the bytes 0.
<<76, 97, 32, 81, 117, 234::utf8, 116, 101>>
The frame size is 19 in header but we need to remove the header size so 19 - 10 = 9
so I think I need to convert utf16 to uf8 before reading the title, otherwise the size of my title is not the same.
Currently, the last IO.inspect
print {:incomplete, "ǿ﹌a ", <<0>>}
Thx