What is the Elixir way of decoding/parsing binary data?

What is the Elixir way of decoding/parsing binary data?

I’m currently using the pattern matching (E.g. for a float <flt::float> = <<63, 243, 190, 118, 200, 180, 57, 88>>) to decode/parse bytes, is this the only way to decode/parse binary data? Ideally I would prefer to call a function that takes in bytes and returns the decoded value.
My intuition tells me that Float.parse() and Integer.parse() should perform the same conversion as the example above but they seem to only parse strings.

1 Like

Does this do what you need? Erlang -- erlang

Hey Chris, Thanks for the reply! I’ve given that a try and it still seems to also only parse/decode the string representations of floats. I am more after something that will take in some bytes that are in ieee floating point format and return a float

1 Like

Looking at the docs for <<>>/1 I’m surprised what you have in your example doesn’t work

If you try to inspect flt it just prints out the bitstring?

Edit: sorry I didn’t read properly – your example works you’re just asking if there are alternatives! Not to my knowledge :sweat_smile:

I am not sure why you would want an alternative to binary pattern match, if you read the documentation it has the possibility to decode pretty much you can think of.

You can easily wrap it in a function:

def parse_binary_float(<<flt::float>>), do: flt

Here’s a version that just parses a float from the front and returns the rest of the binary:

def parse_binary_float(<<flt::float, rest::binary>>), do: {flt, rest}
2 Likes

I guess I also asked the question partly due to a bit of curiosity/confusion as there are standard decode/parsing functions for things such as strings (i.e. Integer.parse(string), Float.parse(string)), however there didnt really seem to be a set of consistent function that did this for bytes - I could only find :binary.decode_unsigned(bytes)

1 Like

Looks like pattern matching is the go to way of decoding bytes/bits - Using it like this meets my use case (cant believe I didn’t think of this). Thanks!

1 Like

Once you get the hangs on how to use binary pattern match, you will not want to go back to do this with functions, as it is cleaner, less code and much more readable.

This is one of my favorite articles about binary pattern matching in Elixir, it’s really neat how they apply binary pattern matching to the task of parsing a PNG:

5 Likes

One of the great things about binary pattern matching is that it allows you to decode multiple values (which can be of different type) from a binary in one go. I just gave a talk at FOSDEM that is an introduction to bit syntax (which powers the binary pattern matching). The talk goes through all the basics and has a couple of examples after the 15-minute mark that show how bit syntax can be used to decode multiple different values at once. The talk is available here - maybe it is useful.

And if you need to wrap it in a function, e.g. for use with pipes, @adamu’s approach is definitely valid.

3 Likes

I really missed the Ruby’s String#unpack.

It was a good talk you gave Sunday! :+1:

1 Like

What exactly you miss there?

I second that, great talk! Although I had a hangover…

1 Like

Something like some_string.unpack('C10Q>*').

Erlang/Elixir’s pattern matching is great, but sometimes I do want the quantifiers, like the 10 and * above.

<<prefix::binary-10, rest::binary>> = some_string
[prefix | read_int64(rest)]

def read_int64(<<>>), do: []
def read_int64(<<int::native-64, rest::binary>>), do: [int | read_int64(rest)]

You 100% have option to define that 10. There is more problem with *, but it can be handled. You can also quite easily implement such unpack function on your own

1 Like

Can also use the long form of that, which might be more expressive and can be used with variables:

a = 10
<<prefix::binary-size(a), rest::binary>> = "some_string"

Good talk, by the way, I wish I had it before.

1 Like

Hello friends! I’m about to drop to community, under MIT licence, my library for bidirectional parse/encode communication.

Idea is you declaring shape of your binary data and set of constraints, highly dynamic ones, like dynamic length, item size, items count, dynamic variant dispatching, optionality of fields using callbacks. Callbacks may receive on request fields from current parse routine along with context options from total parse tree.

Most of the time you will work only with type conversion called ‘managed’. Most human value is possible, like number, utf8 string and so on. Does not matter actual data is utf16 and actual numbers are hidden under mask, I have abstractions to keep it complexity managed away from developer.

There is such features as receiving requested type conversion (binary version, for example) and option scoped from particular context interface.

In additional there is feature of virtual field. Virtual fields allow you declare your data as most readable and sane way independent of actual shape of binary.

Everything compiled down to elixir binary pattern matching, using graph topology I’m able to optimise most of blocks with known shape to single pattern and provide requested intermediate values only once, close as possible to usage and passing around only as function arguments, not creating any allocations or intermediate objects.

The most benefits you will get if you working with complex legacy protocol and you need flow like receive → modify → send.

It’s running in our production for few apps for half of year, and currently I’m finishing basic documentation.

Let me know if you guys are interested, I will try to finish basic docs and publish it this week!

5 Likes