Binary serialisation formats

NobbZ · November 1, 2017, 5:25pm

I’m currently searching for a well specified binary serialisation format.

Requirements:

Some kind of type-checking (if not baked into the format itself, something external like JSON spec is OK, as long as it is well established).
Compression included
Support for at least Go, JavaScript, BEAM, JVM.
Knows about BigNumbers
Knows the difference between Integer and Float

If there weren’t the typing requirementI’d go for the Erlang Term Format, but…

OvermindDL1 · November 1, 2017, 5:31pm

Well the erlang term format is strongly typed, so, why not? ^.^

But still, what kind of type checking would you prefer, and what style? Hmm, let me ask these questions (I know a lot of different formats):

Do you want a free-form format (where you can add/remove structures to it as you wish) or do you want it fully specified ahead of time (think like a struct)?
Do you need it to be versioned so it will be compatible with changes?
If versioned do you want it to be both forwards and backwards compatible or just backwards?
How ‘fast’ do you want de-serialization to be and what features would you be willing to give up for it (such as micro-integer packing and such)?
How ‘packed’ do you want it to be and what features would you be willing to give up for it (such as giving up the ability to read the data from the stream directly instead of being required to parse it out)?

Etc…

NobbZ · November 1, 2017, 9:32pm

After I had some time to actually think more about it, I cam to the conclusion that I actually do not need a formal way to specify types, I can just start out informally and let the types grow and evolve over time as we discover necessity.

So actually I can use one of the binary serialisation formats I’ve at least heard of so far:

ETF, which I am not sure how well supported they are in the Java and JavaScript world they are, for go I haven’t found something yet, but there I could implement the necessary subset on my own…
BSON, which sells itself as binary JSON, therefore I do fear the same problem with numerical values as in JSON itself, but it lists distinct types for double and integer, so I’m not really sure about that, still BigInt is missing…
Protocol Buffers, which do not seem to support BigInt.

So, if there weren’t Java and JavaScript as targets I’d go with ETF…

Also a big point of fear is, If I have an integer that would be floatified with a loss when parsed into JavaScript, and then repacked, how to deal with this inaccuracy? Are there serialisers available that can guarantee (at least in an internal representation) that values do not change simply by reading in and writing them back?

OvermindDL1 · November 1, 2017, 9:39pm

Java/javascript has support for big integers how? o.O

Protocol buffers. Protocol buffers is a good bit slower than most but it is very reliable and can ‘pass through’ untouched things.

However if you are wanting big integers, well I think the erlang term format is about the only one that has that unless you want to encode them as a binary blob or so. ^.^;

There are libraries for ETF in both java and javascript, and even if there were not it is a well documented and easy to implement format (unlike protocol buffers). I’ve implemented it a few times already in a few languages. ^.^

NobbZ · November 1, 2017, 9:44pm

JavaScript doesn’t have, thats one of my problems, Java has BigInteger. No literals, but usable enough.

I’d prefer to have other people implement the JavaScript stuff, therefore a ready to use library is prefered But perhaps I simply postpone the JS target as long as possible

OvermindDL1 · November 1, 2017, 9:46pm

I’ve actually seen a couple of javascript libraries that decode ETF floating around. I did not check if they can ‘encode’ to it too (I did not need that at the time) but there are ones that can decode (and that is half the work done then).