Is anyone using MessagePack?
How is your experience? Which lib are you using?
I use either JSON or fixed size C struct, kinda extreme but it suits my use cases, which deal with large binary files and metadata.
At Football Addicts we use MessagePack intensivelly. It’s about 3 times faster than JSON and has smaller footprint.
One extra good thing about MessagePack usage is that it better preserves shape of the original data:
Poison.encode!(%{1 => 1}) |> Poison.decode!
#=> %{"1" => 1}
Msgpax.pack!(%{1 => 1}) |> Msgpax.unpack!
#=> %{1 => 1}
Also it possible to define application-specific types using the Extension type.
We use https://hex.pm/packages/msgpax library.
Have y’all ever published a benchmark comparing the two? That would be good to have as a reference.
I seem to remember the TechEmpower benchmark discussion indicating that Elixir’s JSON handling was actually one of the biggest reasons it didn’t perform better.
Thank you for the info! Any downsides in your experience/opinion?
I haven’t published any benchmarking of them. Actual difference depends on the shape of the data for serialization,
but let’s measure a 400-elements map with the following script:
data = Stream.iterate({0, true}, fn {key, val} -> {key + 1, val} end) |> Stream.map(fn {key, val} -> {Integer.to_string(key), val} end) |> Enum.take(400) |> Map.new
# # data = Poison.encode!(data)
# for _ <- 1..10 do
# Poison.encode!(data, iodata: true)
# # Poison.decode!(data)
# end
# count = 1000
# 1..count
# |> Enum.reduce(0, fn _, total ->
# :timer.tc(Poison, :encode!, [data, [iodata: true]]) |> elem(0) |> Kernel.+(total)
# # :timer.tc(Poison, :decode!, [data]) |> elem(0) |> Kernel.+(total)
# end)
# |> Kernel./(count)
# |> IO.puts()
# data = Msgpax.pack!(data, iodata: false)
for _ <- 1..10 do
Msgpax.pack!(data)
# Msgpax.unpack!(data)
end
count = 1000
1..count
|> Enum.reduce(0, fn _, total ->
:timer.tc(Msgpax, :pack!, [data]) |> elem(0) |> Kernel.+(total)
# :timer.tc(Msgpax, :unpack!, [data]) |> elem(0) |> Kernel.+(total)
end)
|> Kernel./(count)
|> IO.puts()
On my machine it gives:
Poison v3.1.0 encode/decode: 271.962/326.565 us
Msgpax v1.1.0 pack/unpack: 126.832/101.857 us
Msgpax master pack/unpack: 114.59/67.12 us
Note that Msgpax has been optimized recently, so I added master branch as well.
Also there is GitHub - joakimk/msgpack_benchmark: A simple benchmark of Msgpax vs Poison (Elixir libs for Msgpack and JSON) repository, though it uses old versions.
Only one minor downside: output is not human-readable. However, I think it wasn’t ever an issue for us.
That 's a very manageable downside thank you for providing all the insight now I am less worried about our planned switch to message pack.
If you are wanting speed and great size, it is hard to beat erlangs native term format :erlang.term_to_binary
/:erlang.binary_to_term
and so forth, and there are multiple javascript parsers for it (though they require typed arrays since the erlang term format is a fully packed binary format).
If stars line up right there might be few elixir services majority will be Python and Node though
I haven’t, but I think its performance should be around MessagePack’s one, depending on actual implementation.
I see one downside though: BERT is less space-efficient.
For example [1, 2, 3]
term:
<<107, 0, 3, 1, 2, 3>> # - BERT
<<147, 1, 2, 3>> # - MessagePack
I wonder if I’m doing something wrong, but here’s the result I got from benchfella
Settings:
duration: 1.0 s
## PackBench
[02:34:39] 1/8: bert
[02:34:47] 2/8: msgpax binary
[02:34:50] 3/8: msgpax iodata
[02:34:53] 4/8: poison
## UnpackBench
[02:34:56] 5/8: bert
[02:35:05] 6/8: msgpax binary
[02:35:06] 7/8: msgpax iodata
[02:35:08] 8/8: poison
Finished in 33.08 seconds
## PackBench
benchmark name iterations average time
bert 1000000000 0.01 µs/op
msgpax iodata 1000000 2.16 µs/op
msgpax binary 1000000 2.70 µs/op
poison 50000 48.35 µs/op
## UnpackBench
benchmark name iterations average time
bert 10000000 0.80 µs/op
msgpax binary 1000000 1.18 µs/op
msgpax iodata 1000000 1.69 µs/op
poison 100000 29.45 µs/op
Packing
defmodule PackBench do
use Benchfella
@data %{username: "username", user_id: 123, content: "Elixir is a functional, concurrent, general-purpose programming language that runs on the Erlang virtual machine (BEAM). Elixir builds on top of Erlang and shares the same abstractions for building distributed, fault-tolerant applications. Elixir also provides a productive tooling and an extensible design. The latter is supported by compile-time metaprogramming with macros and polymorphism via protocols.[4] Elixir is successfully used in the industry by companies such as Pinterest[5] and Moz.[6] Elixir is also used for web development, by companies such as Bleacher Report and Inverse,[7] and for building embedded-systems.[8][9] The community organizes yearly events in both United States[10][11][12] and Europe[13] as well as minor local events and conferences.[14][15]"}
bench "msgpax iodata" do
Msgpax.pack!(@data)
end
bench "msgpax binary" do
Msgpax.pack!(@data, iodata: false)
end
bench "poison" do
Poison.encode!(@data)
end
bench "bert" do
:erlang.term_to_binary(@data)
end
end
Unpacking
defmodule UnpackBench do
use Benchfella
@data %{username: "username", user_id: 123, content: "Elixir is a functional, concurrent, general-purpose programming language that runs on the Erlang virtual machine (BEAM). Elixir builds on top of Erlang and shares the same abstractions for building distributed, fault-tolerant applications. Elixir also provides a productive tooling and an extensible design. The latter is supported by compile-time metaprogramming with macros and polymorphism via protocols.[4] Elixir is successfully used in the industry by companies such as Pinterest[5] and Moz.[6] Elixir is also used for web development, by companies such as Bleacher Report and Inverse,[7] and for building embedded-systems.[8][9] The community organizes yearly events in both United States[10][11][12] and Europe[13] as well as minor local events and conferences.[14][15]"}
bench "msgpax iodata", [packed_data: Msgpax.pack!(@data)] do
Msgpax.unpack!(packed_data)
end
bench "msgpax binary", [packed_data: Msgpax.pack!(@data, iodata: false)] do
Msgpax.unpack!(packed_data)
end
bench "poison", [packed_data: Poison.encode!(@data)] do
Poison.decode!(packed_data)
end
bench "bert", [packed_data: :erlang.term_to_binary(@data)] do
:erlang.binary_to_term(packed_data)
end
end
Versions
defp deps do
[{:msgpax, github: "lexmag/msgpax"},
{:poison, "~> 3.1"},
{:benchfella, "~> 0.3"}]
end
It would be interesting to check the space efficiency with larger data structures. What you’ve got there just looks like a couple of extra pieces of data that seem to denote the size of the data itself, while MessagePack is more space efficient because it leaves it out the trade off is that it has to calculate those numbers as it parses the data.
If the size difference is minor, seems like the performance gain with BERT is spelled out in @idi527’s benchmarks. Both are so much faster than Poison that it looks to be a clear win for either though. I’ve definitely seen message pack in more widespread use, which would tend to make me lean that direction in most cases though.
I wonder why the difference between :erlang.term_to_binary
and Msgpax.pack!
is so big, whereas :erlang.binary_to_term
and Msgpax.unpack!
show almost identical times.
:erlang.term_to_binary/1
basically just takes a term and binary packs it using its own internal serializer, which involves practically no operating on the data, just following pointers is all.
:erlang.binary_to_term/1
has to do things like parse the binary to make sure it is valid, look up atoms in the global atom table to get their index, allocate memory, etc… etc…
:erlang.binary_to_term/1
I’d bet would be a lot faster if you packed the atoms using the internal table indexes (the full external term format with atom cache) as the decoding would be significantly faster I’d bet then.
EDIT: Or turn the keys from atoms into strings and see how the times change then?
It is not a surprise that :erlang.term_to_binary
will be faster since, in contrast with Msgpax.pack!
, it is implemented in C.
:erlang.binary_to_term
is also implemented in C, but it has smalled difference with Msgpax.unpack!
because of an important optimization in Msgpax.unpack!
to utilize single match context.
EDIT: Or turn the keys from atoms into strings and see how the times change then?
## PackBench
benchmark name iterations average time
strings bert 1000000000 0.01 µs/op
atoms bert 1000000000 0.01 µs/op
strings msgpax iodata 1000000 1.94 µs/op
atoms msgpax iodata 1000000 2.18 µs/op
strings msgpax binary 1000000 2.41 µs/op
atoms msgpax binary 1000000 2.68 µs/op
atoms poison 50000 47.22 µs/op
strings poison 50000 47.58 µs/op
## UnpackBench
benchmark name iterations average time
strings bert 10000000 0.49 µs/op
atoms bert 10000000 0.78 µs/op
strings msgpax binary 1000000 1.17 µs/op
atoms msgpax binary 1000000 1.22 µs/op
strings msgpax iodata 1000000 1.68 µs/op
atoms msgpax iodata 1000000 1.70 µs/op
strings poison 50000 29.51 µs/op
atoms poison 100000 29.67 µs/op
Yeah, seems to be faster with string keys, thanks.
I’d bet would be a lot faster if you packed the atoms using the internal table indexes (the full external term format with atom cache) as the decoding would be significantly faster I’d bet then.
How would I go about doing that?
Nice, my guess was right, it almost halfed the term processing time.
Short of manually crafting the binaries or hooking into the local remote term transmitter, not easily… ^.^;
I’m sorry to bother you with these silly benchmarks, but could you please clarify what the difference between packing and unpacking with or without iodata
in “real world” apps is? Besides the small difference in the benchmark results.