Anyone using Erlang External Term Format (ETF) instead of e.g. JSON?

Qqwy · November 11, 2018, 7:29pm

So for Planga, we are experimenting with communicating between nodes (written in different languages; in our case we currently have an Elixir and a Ruby node).

This communication is now using RabbitMQ, with Erlang’s ETF (External Term Format) as serialization format, rather than JSON. We chose this because:

It gets minified automatically. (With JSON you can hope for gzipping, but I don’t know if e.g. RabbitMQ will do so)
There are more datatypes that can be directly encoded (JSON does not differ between integers and floats, all numbers are only guaranteed to have limited IEEE754 precision.)
There are no restrictions in top-level types (in JSON the top-level has to be either an object or an array).

Specifically, between Ruby and Elixir, we’re able to simply send big-number integers over the wire, as well as e.g. symbols/atoms.

However, there obviously are also drawbacks:

Be mindful that your symbol table might be filled (memory leak === DoS potential) if reading untrusted data.
IIRC ETF could contain e.g. encoded functions, which you do not want to execute if they are from an untrusted source (And ‘stored procedures’ are things that are of questionable usefullness according to many developers).

So I’m wondering: Are there other people that like using ETF over JSON or another format? Why or why not?

tty · November 11, 2018, 7:43pm

We currently use BERT between Erlang/Java in one subsystem and JSON between Erlang/Java in another. I have also use BERT in other projects.

We found using JSON to be a PITA to change and update in comparison to BERT. This is partly because, at the very least, the Java side we can rely on (minimal) compiler typechecking.

Although you could use a JSON validator the Erlang validator was feature weak (4 years ago). YMMV.

I prefer BERT because it is a wire protocol and compact. I would even consider protobuf over JSON/XML any day.

wojtekmach · November 11, 2018, 8:03pm

Hex.pm API responds with JSON or ETF. The latter is very convenient as we don’t need a JSON parser to understand it:

curl --silent -H "accept: application/vnd.hex+erlang" https://hex.pm/api | elixir -e "IO.read(:stdio, :all) |> :erlang.binary_to_term() |> IO.inspect()"
%{
  "documentation_url" => "http://docs.hexpm.apiary.io",
  "key_url" => "https://hex.pm/api/keys/{name}",
  "keys_url" => "https://hex.pm/api/keys",
  "package_owners_url" => "https://hex.pm/api/packages/{name}/owners",
  "package_release_url" => "https://hex.pm/api/packages/{name}/releases/{version}",
  "package_url" => "https://hex.pm/api/packages/{name}",
  "packages_url" => "https://hex.pm/api/packages"
}

Eiji · November 11, 2018, 10:52pm

@Qqwy I’m using BERT as often as possible. JSON is used only if it’s requried. I love to use such data format, because I do not need to add any external library to write full work result with minimal number of lines + all what you have already said. For me main use case is for Elixir only (my private projects), temporary with JavaScript (client) and again Elixir only (scenic client + Elixir to WebAssembly client) in future.

rvirding · November 12, 2018, 12:09am

Have you tried using :erlang.binary_to_term/2 which has a safe option which is designed to help with this. binary_to_term/2

Qqwy · November 12, 2018, 4:21am

Wow! Great! I was not aware of this feature, and I will immediately start using it!

michalmuskala · November 12, 2018, 8:39am

I wish there was another one called data_only or something similar that would forbid funs. If that was the case, it probably would cover all the needs automatically. Right now, in most cases, I need to traverse the decoded data to check if there were any funs in there.

Are you using BERT as defined in http://bert-rpc.org/ or :erlang.term_to_binary directly? They are two different things. In particular BERT itself doesn’t do maps and has a separate, much more verbose encoding.

Eiji · November 12, 2018, 10:14am

Oh, sorry I though that :erlang.term_to_binary is BERT implementation. Is there any other implementation written in Erlang? If so then which one is better for Elixir ↔ other language and Elixir ↔ Elixir?

tme_317 · November 12, 2018, 2:12pm

Any downside to using Plug.Crypto.safe_binary_to_term to filter out any funs? At least that’s what I’ve been doing… and it looks like they have recently extracted plug_crypto to a separate library in case you’re not already using plug itself.

Since options are passed to erlang I guess you could use Plug.Crypto.safe_binary_to_term(bin, [:safe]) to protect against atom exhaustion also?

michalmuskala · November 13, 2018, 12:01pm

No downside, just that it’s slower than a native option could be since you have to traverse the data after it’s decoded. For now that’s the best option, though.

bettio · November 13, 2018, 1:29pm

Yesterday I wrote a post on this topic, so I would like to share my opinion here https://blog.ispirata.com/how-to-destroy-your-application-using-erlang-binary-to-term-1-575ff7d05333 (I was going to talk about this on the Bert.js topic). Let me know if you don’t agree

keathley · November 13, 2018, 3:39pm

We’ve (meaning @bgmarx) explored this option for our internal service communication. There are a few use cases where it would be really beneficial but we also had several payloads that were faster to encode in JSON and resulted in smaller payloads. These requests are outside the norm so it shouldn’t be taken as a indictment of ETF. Just a reminder that its always important to measure your specific use case. Because of those payloads and because it would require a large change throughout our stack we decided not to pursue it further for the time being.