BinStruct - library for writing declarative code to get rich generated set of binary parsing/encoding features

Ridtt · November 22, 2024, 11:13am

Hex: bin_struct | Hex
Git: GitHub - 4ait/bin_struct
Docs: bin_struct v0.2.13 — Documentation

BinStruct is a library which main function is to convert readable as possible declarations to robust and performant implementations. It will generate set of functions to make parse → decode → create_new → send process trivial.

What BinStruct is not

BinStruct is not an protocol itself, there is no goal to replace asn1, protobuf, erlang binary term or any other protocols. if you can solve your problem using existing protocol - stick with it.

BinStruct is not replacement for binary pattern matching. If your job can be done via pattern match only it will be always better to use it directly. There is layer of complexity this lib adds to make it achive it’s main goal - write declarations, generate implementations automatically. When complexity grows only sane way to keep with it has general declarative structure of each part you working this.

BinStruct is by no means a framework and does not force you to follow any specific structure of how its parts will be used together. Each BinStruct you create is completely self-contained and can be used as you see fit. Whether you want to validate CRC, add encryption, or implement something else inside or outside—it’s entirely up to you, and the library imposes no restrictions on these choices.

What BinStruct is primarily

BinStruct is s tool. Tool to support developer from very beggining with reach set of generated features, allowing to exlore data in every step, to very end running your app in production.

I believe BinStruct is an essential tool for developers. Simply transferring declarations from your protocol documentation into BinStruct special syntax is enough to start parsing your data, decoding it, and exploring its structure. This lets you build an understanding of how to proceed next. It is especially helpful when working with a protocol that is new to you. If you’re unsure where to start or what to focus on, just transfer what you see in the documentation into BinStruct declarations and experiment. At some point, things will start falling into place, and you might even find that the application almost writes itself before you realize it. Even the smallest fragments you implement can already be put to use. You can parse and decode binary data to gain a better understanding of what you’re dealing with without needing to fully implement every detail or dynamic callback. You’ll gradually build out your protocol implementation step by step, and over time, these pieces will naturally connect as your codebase grows. You don’t need all the advanced features like virtual fields, auto-generated fields (builders), or type conversions beyond the basic managed (human-readable) one right away. You can always add them later if you think they’ll make the process easier.

Basic syntax overview

 
 defmodule PngChunk do
 
   use BinStruct
 
   #all dynamic behaviour is callback
   #if we are not specifying type_conversion this is always 'managed' also known as 'human readable'
   register_callback &data_length/1, length: :field
 
   #with fields you build shape of your binary data
   field :length, :uint32_be
 
   #use expanded constructs whenever possible, this is both easier to read and will be validated at parse time
   #its always better to expand arrays/flags/enums even if you don't use them for now, it will help moving forward
   #as you will have more complete picture
   #and also it will give you opportunity to be dispatched as dynamic variant later (read it as if we received something and it has type distinct from listed below it's not this struct, we can catch it via upper variant_of later)
   field :type, {
     :enum,
     %{
       type: :binary,
       values: [
         "IHDR",
         "PLTE",
         "IDAT",
         "IEND",
         "cHRM",
         "gAMA",
         "iCCP",
         "sBIT",
         "sRGB",
         "bKGD",
         "hIST",
         "tRNS",
         "pHYs",
         "sPLT",
         "tIME",
         "tEXt",
         "zTXt",
         "iTXt"
       ]
     }
   }, length: 4
 
   #consuming dynamic behaviour into length_by
   field :data, :binary, length_by: &data_length/1
 
   field :crc, :uint32_be
 
   #dynamic behaviour implementation
   #we returning always 'managed' type conversion, in this case length field will be automatically converted to elixir number
   #and we return this number as it
   defp data_length(length), do: length
 
 end

Performance notes

The library compiles into Elixir binary pattern match and uses optimizations like composing every part with known size into single pattern, always inlining for encoders and static values, caching every requested value.

If in registered_callback field A requested from both B and C , A will be converted to requested type conversion before B (late as possible) and later passed same value to C.
All functions, except for main public function like parse/2, are declared in the same module and marked as private (defp), giving maximum optimization opportunities for the Erlang compiler (erlc).

You can expect performance equal to manually written pattern matches, with some differences: modular structure, validation after each step, and creating structs as the result. It is not correct to compare simple manual parsing patterns directly to what this library does.

I have created small intro post few days ago about it: What is the Elixir way of decoding/parsing binary data? - #20 by Ridtt

I also included example implementation of png parser using BinStruct as alternative to suggested in article in this thread way using raw pattern matching.

For anyone interested in I suggest to start exploring with png example bin_struct/examples/png.exs at master · 4ait/bin_struct · GitHub

Then docs for main macros BinStruct — bin_struct v0.2.13

And then docs for types binary — bin_struct v0.2.13

More complex examples:

When things are hidden in integer: bin_struct/examples/extraction_from_integer.exs at master · 4ait/bin_struct · GitHub

When things are hidden in buffer: bin_struct/examples/extraction_from_buffer.exs at master · 4ait/bin_struct · GitHub

Implementing transport packet: bin_struct/examples/packet_via_higher_order_macro.exs at master · 4ait/bin_struct · GitHub

Dynamically working with recursive data structures: bin_struct/examples/recursive_sequence.exs at master · 4ait/bin_struct · GitHub

Future perfomance improvements:

Problem: you don’t always need all values to be decoded. And it’s always will be not optimal solution no matter there will be decode single field function or not. We can solve it with compile time use cases.

Compile time decode_only use case: bin_struct/examples/compiled_decode_use_case.exs at master · 4ait/bin_struct · GitHub

Ridtt · November 22, 2024, 11:22am

I would be very pleased if you could share your ideas that could be implemented using BinStruct, and I could help you with their implementation and later add them to examples. I’m too tired right now, as I’ve spent about a month preparing this library for publication. Custom types and flexible type conversions were specifically added for the community. I apologize for the messy formatting in the documentation and general disorder. I continue working on these improvements.

You can feel free to let me know anything you feel about and ask any questions.

VictorGaiva · November 22, 2024, 4:28pm

Very interesting.

It was a bit hard to understand what the lib actually does from this post. There could be some examples use cases.

I was able to understand better by reading the docs.

Great job with the project. Very specific use case, but greatly done.

Ridtt · November 22, 2024, 5:11pm

I’m sorry again for this mess. I have a lot of examples from private products which I can’t paste. I want to grow use cases and docs from now every day. You can help me with ideas of any protocol you have worked with but don’t happy with code or you want to work this and I can add such examples. I just don’t have enough time for everything, was busy on work and implementing features/testing/writing basic docs for this lib.

Asd · November 23, 2024, 5:03am

Hi, good library, I’ve read the example and it looks promising! I’ve briefly peeked into the code and it looks interesting, and it has some special formatting, I love that. I also found that some modules are empty and exist only for sake of documentation. I’d suggest to just write separate .md doc files and expose them in hexdocs (absinthe documentation is a good example of how to do this).

I’ll definitely give it a deeper look in some near future!

Ridtt · November 23, 2024, 5:32am

Hello there, thank you. I just have added more complex examples as I promised in intro post.

When things are hidden in integer: bin_struct/examples/extraction_from_integer.exs at master · 4ait/bin_struct · GitHub

When things are hidden in buffer: bin_struct/examples/extraction_from_buffer.exs at master · 4ait/bin_struct · GitHub

Implementing transport packet: bin_struct/examples/packet_via_higher_order_macro.exs at master · 4ait/bin_struct · GitHub

Ridtt · November 23, 2024, 5:36am

I like current way of how automatic doc tests are works. Can I achieve same with separate .md’s?

What is actually special formatting?

Some reasoning:

My main goal is to keep things as they are. Simple things will be simple anyway. Hard things will be hard anyway. I don’t see reason in any shortcuts for example for situation where some length is may be directly inferred from other field like most libraries I saw do. Every attempt of introducing multi-field writer or any other complex behavior has failed.

I ended up with idea of dynamic behavior is always a callback, there is a single way of creating virtual field during parse/decode using read_by and single way of creating anything in new context automatically using builder. I got so far with idea not mixing things and I liked it. Once you have finished writing your declaration and tested it you anyway can forget about what is inside. You have guarantees if it is working alone it will be working same inside any tree. Good way to reuse parts is to compose it into higher order macro, which again don’t care of presence of shortcuts and so on.

Asd · November 23, 2024, 3:58pm

I meant the code formatting. Default is mix format. You use something different, with newlines before end and after def and a lot of other things. I just noted that. I love reading code with uncommon styles of formatting and uncommon approaches to writing code

In my projects I just copy-paste the examples into tests. Doctests are good, but these modules will be loaded in every system using the library and this may be a problem for embedded devices (which are top users of the binary protocols)

And this also brings me to another question. Since this library is generating parsers, is it possible to use it with runtime: false?

Ridtt · November 23, 2024, 4:02pm

Is there a reason to turn runtime prop to false if anyway there is no application module? Will it affect how library can read envs from Application.fetch_env? There is some tools for devs like encoders, built in custom types and will be probably more beside of macros. Generated code is self sufficient but as I understand correctly elixir compilation model I can’t still strip macro generation modules from production as they need to generate code at prod compile time. Empty modules should be no problem tho. Maybe I will strip them, will see. But to be honest I don’t see real impact of those. Would be interesting to strip after compilation modules from Macro.* but I don’t know such tools and don’t think it’s changing anything. I think we can discuss it if someone from AtomVM will want to use BinStruct, Nerves projects should not be affected at all by some empty modules or noop modules after compilation.

Thank you for mentioning code style and formatting, I don’t use any globally, I use basic formatter from jetbrains idea for part of code, in general I just try to set things readable as I can.

And also I expect people at first place write server applications using this lib as I do. But will try to help adapt for any use case.

Asd · November 23, 2024, 8:02pm

BinStruct* modules are called only in compile-time, so there’s only one case when these modules will be required after compilation (aka in runtime). And that’s when someone wants to be able to call macros in iex on running app. I don’t know why anyone would do this in prod, but who knows. It is possible that someone may want to just send module definition into iex as a form of hotfix. In all other cases, there’s no need for modules which are only called in compile time to be present in runtime. Even hot-reloading with relup/appup distributes already compiled .beam files

True, but empty modules and unused modules are usually loaded in prod and are always distributed with release. So that’s unnecessary memory usage. But I agree that it’s very minor and not important in most of the cases. I was just curious when I asked about runtime: false

Asd · November 23, 2024, 9:25pm

I’ve sent a documentation improvement PR. Feel free to add changes to it

Ridtt · November 24, 2024, 8:41am

Got PR, working right now to merge! Thank you.

I have added what this library is and what it is not and basic syntax overview to main post.

Update: new docs are merged and published. My editor not currently happy with code inside tags, but docs looks good and tests are passing.

Ridtt · November 26, 2024, 11:26am

I have added performance notes along with new compile time conception: use cases.

Now you can compile best possible behavior for you’r particular needs. First room to improvement was decode functions. There is now possibility to compile decode only for fields you are using, ensuring best possible performance and zero overhead of abstractions like unused virtual_fields.