Rich_Morin
Data serialization formats
I’d like to start a discussion of data serialization formats, in the context of Elixir. The rest of this note is a combination of personal opinions and links to useful resources; feel free to jump in with your own clues, pointers, reactions, war stories, etc. (ducking…)
edn
edn (extensible data notation) is a subset of Clojure, extracted by Rich Hickey. edn has a rich set of built-in data types, most of which are a good match for Elixir. In addition, it has a mechanism for extending this set with custom data types.
However, because edn is closely tied to Clojure, Transit (see below) may be a better choice for interoperability. For details, see edn’s GitHub page and eden’s Hex page.
JSON
JSON is a subset of JavaScript, extracted by Douglas Crockford. Although JSON shines at interoperability and standardization, it has very limited (and JavaScript-specific) data types. So, for example, I wouldn’t recommend it for cases where one needs to retain and/or transmit specific data types.
JSON is also poorly suited for generating human-readable documents. It’s possible to include comments by using data elements, but this is a hack. And, although it’s quite possible to format JSON nicely, many programmers don’t make the effort. So, a lot of JSON “in the wild” is difficult for humans to read.
JSON-LD (JavaScript Object Notation for Linked Data) is a JSON-based method of encoding linked data. Thus, it can take the place of RDF-encoding formats such as N-Triples, RDF/XML, and Turtle.
TOML
TOML is an acronym for “Tom’s Obvious, Minimal Language”, referring to its creator, Tom Preston-Werner. Although TOML has very limited data types, it excels at generating human-readable documents.
Because the top of each “section” (i.e., sub-tree) can be encoded as a path, TOML works well for encoding deeply-nested hierarchical structures:
[a.b.c.d]
e = 42
Transit
Transit is conceptually similar to edn, in that it is an extensible format with strong data type capabilities. However, it is considerably less tied to the Clojure language. Also, its “wire format” uses JSON or MessagePack. For details, see transit_elixir’s Hex page.
YAML
YAML (“YAML Ain’t Markup Language”) is generally well suited to writing by humans, although the need for multiple levels of indentation can become an issue for deeply nested trees. Also, the syntax definition is rather large, so reading some YAML documents can be difficult. Finally, because YAML “in the wild” isn’t well standardized, interoperability can be an issue.
Most Liked Responses
anuaralfetahe
I had to serialize Elixir data structures before pushing to kafka topic I used :erlang.term_to_binary/1. The performance was good and I was easily able to deserialize the data on the consumer side(also running Elixir). This function is also useful when working with C nifs.
I noticed that I can push raw Elixir data structures to RabbitMQ without any serialization at all. Probably because it is written in Erlang.
When dealing with different technologies I’ve found JSON the easiest to work with because of the wide use and support it has.
D4no0
What about asn.1? It is part of OTP and it is a fully fledged standardised implementation for binary encoding/decoding. If you never used it, protobufs is basically a reimplementation of a small set of features from asn.1 .
While very few people use this protocol, I think it has very big potential in systems where data consistency matters.
The only issue currently is that using it from elixir is very hard, it needs a wrapper with updated documentation.
cmo
I use MessagePack for internal communication over HTTP because it is smaller than JSON and that sort of thing floats my boat.








