Serialisation & Atoms

rawkode · May 1, 2017, 11:29am

Hey all,

As I’m doing more and more Elixir, I’ve come to spot a very recurring situation:

We use :atoms in our code for structs and maps
We then shift some data off to ecto or mongo or neo4j
We pull that data back out and we have stringed keys

I’ve lost count of how many problems I’ve had because code is using :atom syntax and the keys are strings.

Should I just stop using :atoms? Is that what you’re doing?

I originally started looking at writing at atomise function, but I’m assuming this is a common use-case and I’d love to hear how others tackle this.

Thanks

Qqwy · May 1, 2017, 11:47am

The problem is that JSON (and SQL, and most other persistence layers) do not have a first-class atom type.

A proper and common approach is to call String.to_existing_atom/1 whenever you receive a string key from an external application (whether that is user input, database results, the result of an API call to an external application, etc.)

rawkode · May 1, 2017, 11:53am

I don’t really think it’s safe to call String.to_existing_atom/1 proper, nor safe. For data that’s coming from an external source, I predict a lot of raised argument errors.

Perhaps just avoiding atoms for serialised structs / maps is the best approach, without too much boilerplate. My concern with that is that a few of the libraries I’m working with, actually expect atom keys, such as absinthe

benwilson512 · May 1, 2017, 11:58am

Absinthe only expects atom keys in your data in the sense that that’s the default way to look up data. You can change that default either by specifying a different way to get data for a field, or by setting a different default middleware.

The reason that Absinthe defaults to using atom key lookups is because most people are using accessing structured data in their resolvers. This would be data like an Ecto schema. Ecto schemas and ecto embedded schemas have well defined fields so atoms are an easy choice. They also handle decoding database information into these fields for you.

If you have particular graphql objects in Absinthe that you always want to load using data with string keys, you can do this trivially via the middleware/3 callback in your schema. For example, suppose you had an object named :config that you wanted to have use stringified defaults.

def middleware([], %{identifier: field_identifier}, %{identifier: :config}) do
  [{Absinthe.Middleware.MapGet, Atom.to_string(field_identifier)}]
end
def middleware(middleware, _, _) do
  middleware
end

rawkode · May 1, 2017, 12:02pm

Thanks, @benwilson512. That’s useful to know, with regards to working with Absinthe.

Perhaps going string-key only is the better, maintainable, solution for data in other systems

benwilson512 · May 1, 2017, 12:06pm

I guess I’m not sure I understand why you’re saying this. If you have unstructured data then by definition you know nothing about its shape, including the keys, and thus those keys will be strings. If you want to validate the structure of the data you already need to walk through it and produce a validated result. If you’re already walking it you may as well convert those keys which are known to atoms.

The root issue with JSON is not that it lacks a way to have an atom type. Even if it did you still wouldn’t use it in Elixir, because then loading arbitrary JSON strings would produce limitless atoms. The reason we dont’ use atoms with JSON is because JSON lacks a schema. When you serialize something with a schema to something without a schema you lose schema information. When you deserialize it you have to validate that it’s still the same shape you thought it was.

rawkode · May 1, 2017, 12:14pm

There’s actually a really cool JSON Schema in the making that a lot of the REST frameworks have adopted: http://json-schema.org/

The reason I suggested I opt for strings is because I’m using many external data sources: Elasticsearch, PostgreSQL, MongoDB and Neo4j in a single application and it’s become very cumbersome to fetch data from them and convert to atoms, at-least not without putting my own abstraction in front of each. Perhaps that is the best way, it’s just frustrating every-time I encounter this same problem.

benwilson512 · May 1, 2017, 12:21pm

Elasticsearch, PostgreSQL, MongoDB and Neo4j

For at least two of these (Postgres, Mongo) You can use ecto and have a schema for your data, letting you use atom keys. Perhaps Elasticsearch and Neo4j do not, and that’s fine.

I suppose the principle I’m advocating here is: For things with structured data use atom keys, for data with unknown structure use string keys. If it’s Absinthe specifically that you feel like is making it difficult to handle data with both types of keys then we can talk more about that.

Qqwy · May 1, 2017, 12:48pm

And that is exactly the point. If you have defined a struct, and you want to build an instance of that struct from an external source, then the atoms representing the field names are already existing atoms, they were added during the definition of your struct. Therefore, an argument error will be raised if and only if the data you attempt to fill the struct with contains keys that are not part of the struct.

The main reason to use atoms over strings is twofold:

It is faster to work with them and it takes less memory, as atoms are interned strings.
It is easier to work with atoms, as many parts of Elixir’s syntax are made to make atom-based manipulation easy. Consider the shorter map and keyword construction syntaxes, the static and dynamic access operators and the definition of structs.

And these are also the reasons that many libraries expect you to use atom keys in your structures.

Atoms clarify that your data has structure which is why they are used and should be used for structured data.

rawkode · May 1, 2017, 1:49pm

I don’t use Ecto, as it’s an event-sourced application and INSERT ONLY. Perhaps that’s also causing me problems, as I’m sure Ecto would solve some of these challenges

benwilson512 · May 1, 2017, 1:59pm

I don’t know about your use case, but what is incompatible about using ecto with event sourcing and postgres? are you not actually reading out of postgres directly from elixir?

rawkode · May 1, 2017, 2:26pm

Mongo is my event-store and Ecto’s driver hasn’t been updated to Ecto 2. Plus, I’m a single INSERT INTO events for my usage, I don’t really need any of Ecto for my use-case, it’s be redundant overhead.

I never read from my event store, normally, as I have a Kafka event-store subscriber that consumers read from.

Kafka is just another source where my structured data is considered unstructured

lexmag · May 1, 2017, 4:30pm

Ideally, it is better to stick with string keys,
especially, as @benwilson512 said, for unstructured data.
For cases when atom keys is still needed (structs),
@whatyouhide and I made a library: https://hexdocs.pm/maptu/Maptu.html.
It gives fine control on what to do with unknown keys.

Overbryd · November 30, 2017, 9:06pm

Am I doing something wrong, this no longer works with Absinthe ~> 1.4. https://github.com/absinthe-graphql/absinthe/issues/268#issuecomment-348319739