(De)serializing JSON documents into Structs: Just include `struct`? Seemingly not quite

MarcusRiemer · March 28, 2025, 6:48am

I have loads of structs defined via typed_struct and would like to serialize and deserialize these structs to and from JSON. I am currently using Jason, but I am not tied to that at all. Serializing any of my structs into JSON is simple, deriving Jason.Encoder works. An example structure that (hopefully) contains everything I need could look like this:

require Protocol

defmodule Example do
  use TypedStruct
  @type id :: Ecto.UUID.t()

  typedstruct module: Presence do
    field :joined_since, DateTime.t(), enforce: false
    field :reason, String.t()
  end

  typedstruct enforce: true do
    field :id, id
    field :name, String.t()
    field :presences, %{optional(Presence.id) => Presence.t()}
  end

  def build() do
    %__MODULE__{
      id: Ecto.UUID.generate(),
      name: "Some Guest",
      presences: %{
        "#{Ecto.UUID.generate()}": %Presence{
          joined_since: DateTime.utc_now,
          reason: "Creation"
        }
      }
    }
  end
end

Protocol.derive(Jason.Encoder, Example)
Protocol.derive(Jason.Encoder, Example.Presence)

Implementing serialization and deserialization for this however comes with two problems:

By default I am losing the type when serializing as JSON: From my point of view the easiest thing would be to tag each JSON object with the type. As far as I understand it, this is also the way a “normal” map is differentiated from a struct: By having a :__struct__ field present in the map. If I could just serialize this field alongside the normal data, the resulting map would be properly treated as a struct. I am already using the :__struct__ field to know on which module I want to call functions, so to me this seems like a natural thing to do. But it at least doesn’t seem to be encouraged by Jason: I didn’t stumble over any way of doing this except for repeating all keys as part of the only option to the protocol or providing a manual implementation.
```
Protocol.derive(Jason.Encoder, Example, only: [:__struct__, :id, :name, :presences])
Protocol.derive(Jason.Encoder, Example.Presence, only: [:__struct__, :id, :joined_since, :reason])
```
I do fear however that whatever I am doing is misguided. Because when serializing and deserializing such a struct (using the derived protocol from 1.) I made some observations:
```
iex(14)> Example.build |> Jason.encode!() |> Jason.decode!(keys: :atoms)
%{
    id: "21be5f01-eef1-4ff1-afd5-a09919a9f26f",
    name: "Some Guest",
    __struct__: "Elixir.Example",
    presences: %{
      "bfa4f256-d1d9-416c-a08f-8a04e3b7cc06": %{
        id: "b0e55b2f-1541-4987-8b99-b57563584558",
        reason: "Creation",
        __struct__: "Elixir.Example.Presence",
        joined_since: "2025-03-27T12:08:18.791171Z"
      }
    }
  }
```
1. At first I was pleasently surprised to discover, that the keys: :atoms option did not turn the UUID that is part of the presences key into an atom. That saves me from a self-inflicted atom exhaustion I guess, but I don’t understand why the UUID wasn’t converted.
2. It however also didn’t “recover” the DateTime, but left it as a string. I guess this is to be expected, as there is no “proper” JSON representation for date and time.
3. The resulting map suddenly has the :__struct__ “visible” as part of my iex representation. So something about the map I recovered is not quite a “proper” struct.

This leaves me with the following questions:

What is a better way to tell Jason to include the __struct__ when serializing? So far I would probably write a manual implementation and re-use that. Or is there a reason I really should not do this?
What is different from a “proper” struct about the map I get from calling encode! and decode!?
Should I just abandon the idea of serializing to JSON and “simply” write the binary representation into my PostgreSQL database?

Nicd · March 28, 2025, 11:31am

The UUID key in your output is an atom. If it was a string, the syntax would be "bfa..." => %{ instead of "bfa...": %{. As you suspected, this is risky.

Your __struct__ key has been included in the JSON as a string, and thus retrieved back as a string. But the value in an Elixir struct is an atom (the module name). So your struct is broken and Elixir shows it as a regular map. Jason doesn’t support decoding data into structs so you’ll need to do that yourself.

I’d decode with Jason without supplying keys: :atoms (because it’s dangerous) and then look for some library to validate/transform the data forward from that point. I think Ecto can also be used for this but I don’t have experience with that.

LostKobrakai · March 28, 2025, 11:45am

Tbh instead of questioning json vs binary to term I’d question if it is a good idea to try to store structs – imo the answer is no.

Structs are a datatype whose lifetime is coupled to the code running. Each time code is changed the potential exists for a struct definition to have changed. Data stored in a db usually has a lifetime longer than that. Therefore you want to lower values to a simpler format and explicitly transform to and from those higher level values.

That transformation layer is is then the place you can use to upcast or downcast between distinct versions of a struct.

westmark · March 28, 2025, 11:49am

Not a perfect solution to your issue, but I’ve been using data_schema v0.5.0 — Documentation for quite some time now and very happy with it. We have extended it to include a lot more functionality to fit our business case better

mudasobwa · March 28, 2025, 1:22pm

You might want to take a look at estructura library and specifically to Estructura.User example which is solving exactly the issues you’ve described:

loading nested map data into structs
coercion of types (like ISO-8601 binary into DateTime struct)
input validation
and generation of StreamData “instances” for property-based testing.

al2o3cr · March 28, 2025, 4:27pm

MarcusRiemer:

    field :presences, %{optional(Presence.id) => Presence.t()}
  end

  def build() do
    %__MODULE__{
      id: Ecto.UUID.generate(),
      name: "Some Guest",
      presences: %{
        "#{Ecto.UUID.generate()}": %Presence{

Nitpick: build here is not building the type that you’ve declared for presences - it produces a map with atom keys, but the type specifies Ecto.Uuid.t() keys.

You’d write it as (not tested, apologies if it’s wrong):

  def build() do
    %__MODULE__{
      id: Ecto.UUID.generate(),
      name: "Some Guest",
      presences: %{
        Ecto.UUID.generate() => %Presence{

garrison · March 28, 2025, 5:27pm

I think this is the crux of the issue.

In order to rectify this you could create a simple mapping atom => module i.e. :my_struct => MyApp.Example.MyStruct and store a type: :my_struct in each JSON blob. That will at least save you if you ever need to rename your modules.

Then you could implement default values in your struct definitions when you add new fields for backwards compatibility. You could also drop fields which are no longer found in the struct. You could add type checking…

And of course by the time you are finished you will have reimplemented Ecto on top of JSON

There are two real, underlying problems here: first, Ecto has no polymorphic embeds (which is what you’re really trying to do with the structs). And second, relational databases fail to properly model polymorphic relations because foreign key references are too rigid (restricted to a single table).

For us, the former problem is probably easier to solve (though personally I am far more interested in the latter).

lud · March 28, 2025, 5:42pm

Shameless plug, you could do something like that:

Mix.install([:typed_struct, :jason, :ecto, :jsv], consolidate_protocols: false)

require Protocol

defmodule Example do
  use TypedStruct
  @type id :: Ecto.UUID.t()

  typedstruct module: Presence do
    field(:joined_since, DateTime.t(), enforce: false)
    field(:reason, String.t())
  end

  typedstruct enforce: true do
    field(:id, id)
    field(:name, String.t())
    field(:presences, %{optional(Presence.id()) => Presence.t()})
  end

  def build() do
    %__MODULE__{
      id: Ecto.UUID.generate(),
      name: "Some Guest",
      presences: %{
        Ecto.UUID.generate() => %Presence{
          joined_since: DateTime.utc_now(),
          reason: "Creation"
        }
      }
    }
  end
end

:ok = Protocol.derive(Jason.Encoder, Example)
:ok = Protocol.derive(Jason.Encoder, Example.Presence)

defmodule ExampleSerializer do
  import JSV

  defschema_for(Example, %{
    type: :object,
    properties: %{
      id: %{type: :string},
      name: %{type: :string},
      presences: %{
        type: :object,
        propertyNames: %{type: :string, format: :uuid},
        additionalProperties: PresenceSerializer
      }
    }
  })
end

defmodule PresenceSerializer do
  import JSV

  defschema_for(Example.Presence, %{
    type: :object,
    properties: %{
      joined_since: %{type: :string, format: :"date-time"},
      reason: %{type: :string}
    }
  })
end

ExUnit.start()
import ExUnit.Assertions

deserializer = JSV.build!(ExampleSerializer, formats: true)
original_data = Example.build()
json = Jason.encode!(original_data, pretty: true)
{:ok, new_data} = JSV.validate(Jason.decode!(json), deserializer, cast_formats: true)
assert original_data == new_data

venkatd · March 29, 2025, 7:40am

I want to give a shoutout to the library Flint (GitHub - acalejos/flint: Declarative Ecto embedded schemas for data validation, coercion, and manipulation.) which we have been using extensively in our app.

You can lean on Ecto to declare schemas and go to and from JSON.

MarcusRiemer · March 29, 2025, 8:08pm

Thank you all, what an amazingly welcoming place producing a plethora of useful information this forum is.

In hindsight I should have clarified that the data that is meant to be stored is comparatively short lived. It’s a long lived process for an asynchronous turn based game and I was mainly looking for a way to make it survive server restarts and possibly periodically storing it away just in case.

I sort of liked the idea to have the data queryable in PostgreSQL as it has some amazing JSON capabilities. But I guess I will go with the YAGNI route and start with the simplest solution (binary storage) and worry about the implications of version updates if I ever make it that far. But I will at least tag the data with some kind of version at the root and hope that I am smart enough to keep in mind bumping that version if I make incompatible changes.

Thanks also for the corrections regarding the atom keys, I missed the fact that it’s the delimiter between key and value that tells me the type of the key. The fact that I missed the __struct__ value is not an atom (representing the module) seems a little embarassing in hindsight, but at least now I really know. And the nitpick regarding the type of the presences key was most welcome as well.

Last but not least: Thanks alot for all the library suggestions. It seems like there is a solution for desired level of complexity if I ever make it far enough. That’s reassuring

gregvaughn · March 29, 2025, 8:22pm

You can use a jason library to go from json binary to a graph of Elixir data structures (maps and lists) and then define Ecto embedded schemas to take that as “params” and generate a graph of structs (all Ecto schemas are structs too) using all the cast power Ecto types offer. I gave a talk a few years back, “Ecto without a DB” exploring that when I worked at a company transforming a lot of json from 3rd party APIs.

user20230119 · March 30, 2025, 9:40am

How do these libraries compare to Ash? Can Ash load nested map data into structs to and from JSON?

@zachdaniel

zachdaniel · March 30, 2025, 11:24am

Easily, yes. You can use the :embedded data layer to model an Ash.Resource that behaves like an Ash.Type.

defmodule MyApp.Profile do
  use Ash.Resource, data_layer: :embedded

  attributes do
    attribute :bio, :string, allow_nil?: false
  end
end

# on `User` for example
attribute :profile, MyApp.Profile

The above has various interesting benefits like the ability to leverage policies, validations, custom actions on the resource to handle updating logic etc.

For simple map → struct validation, you can use the :struct type with the :fields constraint. I typically recommend using Ash.Type.NewType for this, which creates a new type based off of another type and constraints. For example:

defmodule MyApp.Profile do
  destruct [:bio]

  use Ash.Type.NewType, subtype_of: :struct, constraints: [
    instance_of: __MODULE__,
    fields: [
      bio: [type: :string, allow_nil?: false, constraints: [min_length: 25]]
    ]
  ]
end

You can use custom types as field types, etc, allowing for pretty much anything you could want to do.

(De)serializing JSON documents into Structs: Just include `__struct__`? Seemingly not quite

(De)serializing JSON documents into Structs: Just include `struct`? Seemingly not quite