Jexon - Lossless encoding to JSON

Menkir · August 7, 2023, 1:15pm

Hi everyone,

I recently released a small library built on top of Jason that converts Elixir maps and structs to JSON without loss. Atoms and tuples are data types that have no direct representation in JSON. As a result, keys and values of these data types are specially encoded with prefixes. Additionally, structs without the @derive attribute can be directly encoded to JSON.

The library offers three functions:

to_map: This function formats structs into maps and stores the __struct__ key.
to_json
from_json

Feel free to contribute in case you have more edge cases to consider.

Happy Coding!

mayel · August 7, 2023, 3:43pm

Looks interesting! Some examples of the JSON output would be useful in the readme.

al2o3cr · August 7, 2023, 8:13pm

One tricky extreme edge-case: Jexon.to_json will produce the same output for these two maps

%{"__atom__:foo" => 1}
# vs
%{foo: 1}

A less edge-case situation: what about maps with other kinds of terms as keys? For instance, a lot of “sparse grid” problems make sense as a map of {x, y} tuples to values:

game_of_life_grid = %{
  {0, 0} => 1,
  {0, 1} => 2,
  # etc
}

Menkir · August 8, 2023, 5:00am

For the extreme edge case: Yes, both lead to the same result because atoms would be encoded as strings in any other Encoder like Jason, Poison etc… If you want to keep the information about the type so that you can parse it back to the original data structure you need to encode the type like __atom__:... otherwise you will never know if it’s a string key or an atom. You can try it oppisite: Take the JSON output and parse it back you will always get

%{foo: 1}`

If you take Jason or Poison you need to pass an option like keys: :atoms so it ends up to the developer and not to the data itself which types it has.

Tuple keys are a good one. I didn’t consider tuples as keys. Will fix it soon. Thanks!

kpanic · August 8, 2023, 8:45am

Hey @Menkir – nice lib! kudos!

What would happen if you have a struct like this:

My.Fancy.Module.With.A.Struct

you serialize (to_map) it and store it somewhere in a database.
Then the dev changes their mind and renames the struct to My.Fancy.Module.Struct
Would then the from_json function fail to decode to struct since the module name changed?

Thanks again for creating this lib. I had the very same idea some time ago and wanted to give it a try but I have been lazy.

Menkir · August 8, 2023, 10:56am

Hi @kpanic

yes, you will get a DecodeError if you try to decode a invalid struct from json. You need to write a migration script that replaces the module names with the new ones. Just like as for SQL Tables with Ecto.

# Example
defmodule Foo do
 defstruct ~w(foo baz bar)a
end

defmodule Baz do
 defstruct ~w(foo baz bar ban)a
end

{:ok, json} = Jexon.to_json(%Foo{foo: 1, baz: 2, bar: 3})

json = String.replace(json, "Elixir.Foo", "Elixir.Baz") 

{:ok, %Baz{foo: 1, baz: 2, bar: 3, ban: nil}} = Jexon.from_json(json)

adamu · August 11, 2023, 1:28am

Sorry to be that guy, but if the purpose is to have kind of transport encoding with the objective of eventual decoding back into Elixir, it seems that external term format would be a good choice, unless you need JSON for other interoperability reasons. Maybe some suggested use cases or a mention of this in the docs would be useful

hst337 · August 11, 2023, 10:44am

x |> :erlang.term_to_binary() |> Base.encode64()

Produces valid JSON string

dimitarvp · August 11, 2023, 11:55am

I am not gonna bust your chops like the others but they do raise good points like the ambiguity problems.

If it absolutely positively must be JSON then @hst337’s idea is the most concise and approachable. If you can branch out of JSON then MessagePack (represented by the Elixir library msgpax) is great.

Menkir · August 11, 2023, 12:04pm

I fully understand this. We initially considered this formatting, but we had several concerns:

If you want to transport the state via JSON (for instance, for persistence), it’s necessary to encode it into Base64. Subsequently, you can store it in any DB of your choice. Yet, when you try to retrieve this state, you’re confronted with binary data, which is incomprehensible to both the remote machine and to you.

With Jexon-encoded Elixir structs, it becomes feasible to perform diffs between varied states or carry out migrations when certain keys have changed.

Debugging is more straightforward than when using ETF. I can effortlessly copy and paste parts or the entire state into a remote IEX, then pipe it through functions to diagnose the issue.

Ultimately, the decision boils down to preference: Would you rather store your data in binary or JSON format? In my specific situation, the data was initially stored locally in ETS. However, as is often the case, requirements shifted, necessitating the state’s transfer to a centralized backup service that exclusively accepts JSON.

al2o3cr · August 11, 2023, 1:34pm

YMMV - I’ve specifically used ETF for this because of this reason before. The reasoning is based on Hyrum’s Law: if users can see the inside of the state, then they will start depending on details inside.

DEvil0000 · August 11, 2023, 2:15pm

The issues with base64 or similar things are:

you can not diff on it naive and natively
it is less compactable (“zip”, double delta compression, …)
you need base64 encoding/decoding as a extra step when transport as json is given
not readable in native format

This also applies to binary formats or other codings.
JSON has in addition the advantage that basically all modern databases and other sotftware supports it natively.

As so often in software there are pros and cons and it depends on your needs if the approach fits. Of course there are other options and pathes one can choose for implementation or architecture.