No worries, I’m happy to answer.
Let’s start with packing. Packing to iodata
or to binary
data depends on what you will do with the result. In most cases IO functions (like sending data to a TCP socket, or writing data to a file) allow you to pass iodata
, and actual IO handler will convert it to a binary
data later (C-level usually), or even write chunks of the iodata
separatelly.
During unpacking, it is the opposite: you want to supply binary
data to parser (otherwise a parser will convert iodata
to binary
data before parsing anyways).
I took a data structure from the msgpack_benchmark/test/data.json at master · joakimk/msgpack_benchmark · GitHub file and it gave me the following:
:erlang.term_to_binary(data) |> byte_size()
#=> 6226
Msgpax.pack!(data, iodata: false) |> byte_size()
#=> 4999
Looks like 19.7%
difference.
iex(2)> data = %{username: "username", user_id: 123, content: "Elixir is a functional, concurrent, general-purpose programming language that runs on the Erlang virtual machine (BEAM). Elixir builds on top of Erlang and shares the same abstractions for building distributed, fault-tolerant applications. Elixir also provides a productive tooling and an extensible design. The latter is supported by compile-time metaprogramming with macros and polymorphism via protocols.[4] Elixir is successfully used in the industry by companies such as Pinterest[5] and Moz.[6] Elixir is also used for web development, by companies such as Bleacher Report and Inverse,[7] and for building embedded-systems.[8][9] The community organizes yearly events in both United States[10][11][12] and Europe[13] as well as minor local events and conferences.[14][15]"}
iex(3)> Msgpax.pack!(data, iodata: false) |> bit_size
6528
iex(4)> :erlang.term_to_binary(data) |> bit_size
6672
iex(5)> :erlang.term_to_binary(data, compressed: 9) |> bit_size
4040
iex(6)> Poison.encode!(data) |> bit_size
6616
:erlang.term_to_binary(data, compressed: 9)
is not really fair since the client probably won’t implement the decompression algorithm on its side
Looks like BERT spec doesn’t mention compression. With compressed: 9
option it is indeed not really a fair comparison. JSON or MessagePack output can be passed to :zlib.compress/1
as well.
It seems to be a bit more efficient as well
iex(11)> Msgpax.pack!(data, iodata: false) |> :zlib.compress |> bit_size
3960
iex(12)> :erlang.term_to_binary(data) |> :zlib.compress |> bit_size
4008
iex(13)> :erlang.term_to_binary(data, compressed: 9) |> :zlib.compress |> bit_size
4128 # funny
term_to_binary with compressed: 9 will only compress binaries/strings over a certain size, it does nothing else to the overall structure (other than using a different tag for compressed binaries, which I’ve no doubt some of the javascript erlang term parsers support just fine). Passing the term_to_binary output to :zlib.compress will compress the entire structure and is indeed a more traditional way to pack for size.
For note, term_to_binary was made for speed in parsing to/from the internal erlang and on-the-wire format, it is not made to be perfectly compact, but rather is made to be very fast. zlib is a perfectly fine method of compressing it and indeed that might be cool to add to a timed benchmark as well.
As for the term_to_binary with compressed 9 being passed to zlib being larger is because the internal binary is already compressed, which lowered the entropy of the overall set, which is harder to compress, hence the final size is larger, it is expected, not just funny, it is better to compress the raw stream, not compress parts of it then compress all those together.
It adds ~30 µs/op to every library (for packing).
Sorry for messy names (compress strings bert (compress: 9)
here means :erlang.term_to_binary(data, compressed: 9)
), the rest of compress *
are compressed with :zlib.compress
## PackBench
benchmark name iterations average time
atoms bert 1000000000 0.01 µs/op
strings bert 1000000000 0.01 µs/op
strings msgpax iodata 1000000 1.97 µs/op
atoms msgpax iodata 1000000 2.02 µs/op
atoms msgpax binary 1000000 2.40 µs/op
strings msgpax binary 1000000 2.47 µs/op
compress strings bert (compress: 9) 100000 26.08 µs/op
compress atoms bert 50000 30.05 µs/op
compress strings bert 100000 30.30 µs/op
compress atoms msgpax iodata 50000 34.06 µs/op
compress strings msgpax iodata 50000 34.42 µs/op
compress atoms msgpax binary 50000 35.02 µs/op
compress strings msgpax binary 50000 37.38 µs/op
atoms poison 50000 45.89 µs/op
strings poison 50000 46.79 µs/op
compress strings poison 20000 77.60 µs/op
compress atoms poison 20000 80.91 µs/op
Unfortunately it does not handle tuples, so you cannot just throw in any Erlang term for packing/encoding. (JSON has the same limitation.)
The same is true for pids, refs and ports, but these are just less painful constraints.
:erlang.term_to_binary and binary_to_term can handle everything you can imagine. Even anon functions.
Has anyone ever seen a way to parse the output of :erlang.term_to_binary
with JavaScript? For example using https://github.com/okeuday/erlang_js or https://github.com/rustyio/BERT-JS.
There are quite a few javascript libraries that can actually. I was recently looking at… forget what it was called, I think it was named after a flower or something… But yeah there are a quite a few.
interesting. I wonder if that’s a useful way to quickly send data to a front end.
Just do not ‘accept’ information from the front-end like that, you’d be surprised what code can run when de-serializing something. ^.^;
It ‘might’ be safe, but I’d want to test it really really well…
We are in heterogeneous environment and I am just hoping there might be a place for Elixir in our stack, so Elixir/Erlang specific considerations unfortunately can not be a primary drivers for choosing encoding format.
I perfectly understand this. But it’s not only Erlang and Elixir that has tuples… For me tuples are the only problem with it.
In MessagePack, non-standard (or application-specific) data structures can be serialized by using the Extension type.
For me, tuples is a standard and not application specific data type.
(MessagePack also struggles with large integers. That is also a common data type in python and ruby, for instance.)
I’m not saying that MessagePack is bad. Just trying to highlight some weeknesses in it.