Anyone using message pack?

No worries, I’m happy to answer.
Let’s start with packing. Packing to iodata or to binary data depends on what you will do with the result. In most cases IO functions (like sending data to a TCP socket, or writing data to a file) allow you to pass iodata, and actual IO handler will convert it to a binary data later (C-level usually), or even write chunks of the iodata separatelly.
During unpacking, it is the opposite: you want to supply binary data to parser (otherwise a parser will convert iodata to binary data before parsing anyways).

2 Likes

I took a data structure from the msgpack_benchmark/test/data.json at master · joakimk/msgpack_benchmark · GitHub file and it gave me the following:

:erlang.term_to_binary(data) |> byte_size()
#=> 6226
Msgpax.pack!(data, iodata: false) |> byte_size()
#=> 4999

Looks like 19.7% difference.

iex(2)> data = %{username: "username", user_id: 123, content: "Elixir is a functional, concurrent, general-purpose programming language that runs on the Erlang virtual machine (BEAM). Elixir builds on top of Erlang and shares the same abstractions for building distributed, fault-tolerant applications. Elixir also provides a productive tooling and an extensible design. The latter is supported by compile-time metaprogramming with macros and polymorphism via protocols.[4] Elixir is successfully used in the industry by companies such as Pinterest[5] and Moz.[6] Elixir is also used for web development, by companies such as Bleacher Report and Inverse,[7] and for building embedded-systems.[8][9] The community organizes yearly events in both United States[10][11][12] and Europe[13] as well as minor local events and conferences.[14][15]"}
iex(3)> Msgpax.pack!(data, iodata: false) |> bit_size
6528
iex(4)> :erlang.term_to_binary(data) |> bit_size
6672
iex(5)> :erlang.term_to_binary(data, compressed: 9) |> bit_size
4040
iex(6)> Poison.encode!(data) |> bit_size
6616

:erlang.term_to_binary(data, compressed: 9) is not really fair since the client probably won’t implement the decompression algorithm on its side

Looks like BERT spec doesn’t mention compression. With compressed: 9 option it is indeed not really a fair comparison. JSON or MessagePack output can be passed to :zlib.compress/1 as well.

1 Like

It seems to be a bit more efficient as well

iex(11)> Msgpax.pack!(data, iodata: false) |> :zlib.compress |> bit_size
3960
iex(12)> :erlang.term_to_binary(data) |> :zlib.compress |> bit_size
4008
iex(13)> :erlang.term_to_binary(data, compressed: 9) |> :zlib.compress |> bit_size
4128 # funny

term_to_binary with compressed: 9 will only compress binaries/strings over a certain size, it does nothing else to the overall structure (other than using a different tag for compressed binaries, which I’ve no doubt some of the javascript erlang term parsers support just fine). Passing the term_to_binary output to :zlib.compress will compress the entire structure and is indeed a more traditional way to pack for size.

For note, term_to_binary was made for speed in parsing to/from the internal erlang and on-the-wire format, it is not made to be perfectly compact, but rather is made to be very fast. zlib is a perfectly fine method of compressing it and indeed that might be cool to add to a timed benchmark as well. :slight_smile:

As for the term_to_binary with compressed 9 being passed to zlib being larger is because the internal binary is already compressed, which lowered the entropy of the overall set, which is harder to compress, hence the final size is larger, it is expected, not just funny, it is better to compress the raw stream, not compress parts of it then compress all those together. :slight_smile:

1 Like

It adds ~30 µs/op to every library (for packing).

Sorry for messy names (compress strings bert (compress: 9) here means :erlang.term_to_binary(data, compressed: 9)), the rest of compress * are compressed with :zlib.compress

## PackBench
benchmark name                       iterations   average time
atoms bert                           1000000000   0.01 µs/op
strings bert                         1000000000   0.01 µs/op
strings msgpax iodata                   1000000   1.97 µs/op
atoms msgpax iodata                     1000000   2.02 µs/op
atoms msgpax binary                     1000000   2.40 µs/op
strings msgpax binary                   1000000   2.47 µs/op
compress strings bert (compress: 9)      100000   26.08 µs/op
compress atoms bert                       50000   30.05 µs/op
compress strings bert                    100000   30.30 µs/op
compress atoms msgpax iodata              50000   34.06 µs/op
compress strings msgpax iodata            50000   34.42 µs/op
compress atoms msgpax binary              50000   35.02 µs/op
compress strings msgpax binary            50000   37.38 µs/op
atoms poison                              50000   45.89 µs/op
strings poison                            50000   46.79 µs/op
compress strings poison                   20000   77.60 µs/op
compress atoms poison                     20000   80.91 µs/op
2 Likes

Unfortunately it does not handle tuples, so you cannot just throw in any Erlang term for packing/encoding. (JSON has the same limitation.)

The same is true for pids, refs and ports, but these are just less painful constraints.

:erlang.term_to_binary and binary_to_term can handle everything you can imagine. Even anon functions.

Has anyone ever seen a way to parse the output of :erlang.term_to_binary with JavaScript? For example using https://github.com/okeuday/erlang_js or https://github.com/rustyio/BERT-JS.

There is also https://github.com/synrc/n2o/blob/master/priv/protocols/bert.js from n2o.

1 Like

There are quite a few javascript libraries that can actually. I was recently looking at… forget what it was called, I think it was named after a flower or something… But yeah there are a quite a few.

1 Like

interesting. I wonder if that’s a useful way to quickly send data to a front end.

Just do not ‘accept’ information from the front-end like that, you’d be surprised what code can run when de-serializing something. ^.^;

It ‘might’ be safe, but I’d want to test it really really well…

1 Like

We are in heterogeneous environment and I am just hoping there might be a place for Elixir in our stack, so Elixir/Erlang specific considerations unfortunately can not be a primary drivers for choosing encoding format.

I perfectly understand this. But it’s not only Erlang and Elixir that has tuples… For me tuples are the only problem with it.

In MessagePack, non-standard (or application-specific) data structures can be serialized by using the Extension type.

1 Like

For me, tuples is a standard and not application specific data type.
(MessagePack also struggles with large integers. That is also a common data type in python and ruby, for instance.)

I’m not saying that MessagePack is bad. Just trying to highlight some weeknesses in it.