Overkill to use Ecto Schema to map external JSON to structs?

lancejjohnson · December 19, 2016, 3:59pm

I’m currently working on an application for work that fits within our continuous delivery cycle. The application uses GitHub extensively in the form of receiving webhooks from GitHub and in GETing and POSTing data.

At present, we are working with GitHub data as string-keyed maps directly decoded from the received payload (e.g. Poison.decode!(github_payload)). I’m currently thinking through the pros and cons of mapping that data to structs for representing the data internally in the application. I’m weighing the option of using Ecto.Schema to define these structs and nested relationships. At present, none of this data is persisted on our end.

As one example, when the “status” of a commit changes in GitHub, our application receives a StatusEvent from GitHub’s webhook. The code below is an abbreviated example of using Ecto to map this data to structs:

defmodule Example.Github.StatusPayload do
  use Ecto.Schema
  import Ecto.Changeset

  @primary_key false
  embedded_schema do
    field :sha, :string
    field :description, :string
    field :state, :string
    embeds_one :commit, Example.Github.Commit
    embeds_many :branches, Example.Github.Branch
    embeds_one :repository, Example.Github.Repository
  end

  def from_json(data) when is_binary(data) do
    Poison.decode!(data) |> from_json
  end
  def from_json(data) when is_map(data) do
    %__MODULE__{}
    |> cast(data, [:sha, :description, :state])
    |> cast_embed(:commit)
    |> cast_embed(:branches)
    |> cast_embed(:repository)
    |> apply_changes
  end
end

defmodule Example.Github.Commit do
  use Ecto.Schema
  import Ecto.Changeset

  @primary_key false
  embedded_schema do
    field :sha, :string
    field :url, :string
  end

  def changeset(struct, data) do
    struct |> cast(data, [:sha, :url])
  end
end

# etc.

Here is a test to show how this might be used in the application.

defmodule Example.Github.StatusPayloadTest do
  use ExUnit.Case
  alias Example.Github.{StatusPayload, Commit, Repository, Branch}

  setup do
    json_file = "test/support/fixtures/status_event.json"
    binary = json_file |> File.read!
    data = binary |> Poison.decode!
    {:ok, data: data, binary: binary}
  end

  test "parsing to structs from map", ctx do
    payload = StatusPayload.from_json(ctx.data)

    assert_payload_parsed_to_structs(payload)
    assert_branches_parsed_nested_structs(payload.branches)
  end

  def assert_payload_parsed_to_structs(payload) do
    assert %{
      commit: %Commit{},
      repository: %Repository{},
      branches: [_branch | _other]
    } = payload
  end

  def assert_branches_parsed_nested_structs(branches) do
    for branch <- branches do
      assert %Branch{commit: %Commit{}} = branch
    end
  end
end

My question: is this overkill for working with exteral JSON data? Each time I go down this route and begin setting up all the schema code for the various GitHub resources we use, I feel like I’m overdoing it. On the other hand, I can see advantages of having the data represented in this way. What do you think?

vic · December 19, 2016, 5:58pm

I dont feel like using ecto in that way would be overkill. Actually that’s one of my fav features of ecto.

I mean, using ecto like this, gives you an struct and lets you perform ecto changeset operations on them to validate your data. That’s one of the things I like most from ecto: the data needs not to be tied to any source (any db for example), some time ago I was doing an rest-api and ended doing like you in order to validate and cast the incoming data, but just wanted a quick way to define my schemas for every endpoint, so I made params (just a wrapper around ecto schema).

So IMHO your approach looks more than valid to me

lancejjohnson · December 21, 2016, 1:47pm

@vic Thanks for your feedback. It’s encouraged me to continue down that route. I’m also looking forward to reading your params package.

josevalim · December 21, 2016, 2:20pm

Agreed. Ecto was designed to work like that.

You can also use changesets without schemas, if you want, but in its current version it does not support embeds: Ecto.Changeset — Ecto v3.11.1

svarlet · December 21, 2016, 4:49pm

@vic @lancejjohnson I’d like to read a blog post explaining this approach, it crossed my mind a while ago for a very similar usecase.

homanchou · December 21, 2016, 5:59pm

Ok, piggy backing off this idea of “Overkill” to use structs for embedded json: I’m wondering the same thing about polymorphic embedded structs.

I have an events table where it needs to support different kinds of json payload for different events.

 schema "events" do
    field :aggregate_id, :binary_id
    field :version, :integer
    field :event_name, :string
    field :data, :map
  end

The shape of the json data would depend on the event (so it doesn’t have a fixed struct), for example:

%Event{event_name: "create_inventory", data: %{sku: "H14000-WT", warehouse_id: "ABC", initial_qty: 5} ... }
%Event{event_name: "reset_inventory_quantity", data: %{new_qty: 5}, aggregate_uuid: xxxx-xxxx ... }

Using regular maps works but has the following issues:

It’s a bit fragile constructing the data payload of a bare map. It would be nice to use structs that offer some validation.

%Event{ event_name: “create_inventory”, data: %CreateInventory{ … } … } |> Repo.insert

I don’t know how to do this yet, currently Ecto will reject the struct for me and needs a map during insert. Could a changeset be used to strip the struct into a map?

When querying with Ecto, the json gets converted to a map with string keys even though I use atom keys in the code. Since event sourcing is all about pulling up past events and playing new events on top of them, I’ll need to do some extra work to force the returned map result from Ecto into atom keys, where does that massaging go? And if I could figure out where to do that, then could I also just “cast” the returned result into the arbitrary struct that I want based on the event_name?

How would that be achieved? Or is that overkill? Or is this (STI-ish thinking) an anti-pattern?

Looks like a similar question was once asked here but didn’t get much answer: http://stackoverflow.com/questions/40208167/polymorphic-embedded-structs

I would love some input or advice.

vic · December 21, 2016, 6:15pm

You might want to define a custom Ecto.Type basically they provide strategies for converting values between elixir and the underlying type on the db. In your case converting between a “json” map and elixir structs. You’d however need to store the “data” type name (eg CreateInventory) so you can know which struct to create when loading from db.

homanchou · December 21, 2016, 11:06pm

Looking at Ecto.Type, when doing casting for a field I cannot look at the “type” coming from another field, right? So I would need to store the type in the map itself. Sort of like adding __struct__ into the map.

krb29 · June 20, 2019, 5:01pm

@homanchou wondering if you ever figured this out?

I’m interested in doing something similar as well - basically having a polymorphic embedded_schema that can be cast/loaded from the DB with proper validations.

homanchou · June 20, 2019, 8:17pm

That was some time ago. I think I put it on the back burner and didn’t return to that spike. Now for event sourcing there is commanded, if I were to start it up again I might start there instead of rolling my own.

mathieuprog · May 30, 2020, 4:24pm

@krb29 I published a library that brings support for polymorphic embeds

ricoh · February 5, 2021, 12:36pm

Just a quick follow up question on this similar issue, is it possible to use Ecto to remap key names from the ones supplied in a JSON or map to the keys defined in the Ecto schema? if so, how would one go about achieving this…
…so far i’ve managed to setup mapping of the maps to elixir stucts with ecto, using embedded_schema, figured remapping would work with adding source options while setting up the fields for the schema with Ecto.Schema.field/3, but this does not work, figured probably what i needed was a schema instead of embedded_schema; due to the issue with source; but then again, the data am remapping is not stored anywhere, ideally arbitrary maps that need to be remapped to other key values…
…any ideas are welcome

IvanR · September 18, 2021, 7:16pm

One possible alternative to Ecto or Poison approaches for parsing a JSON binary into the nested struct with types validation and keys remapping is to use Jason + Kernel.defstruct/1 with @type t() :: ... + Domo combo.

Here is an example app that:

parses a JSON binary with Jason.decode/1
translates the returned map to the nested structs with custom MapShaper.from_map/3 by adopting the MapShaper.Target protocol; that includes:
- filtering of JSON items with ExJSONPath.eval/2
- remapping of key names to the struct field names
validates the final nested struct to conform to t() type with the ensure_type_ok/1 added to the struct by Domo

The definition looks like the following:

defmodule JsonReply.ProductCatalog do
  @moduledoc false

  use Domo

  alias JsonReply.ProductCatalog.{ImageAsset, ProductEntry}

  defstruct image_assets: [%ImageAsset{}], product_entries: [%ProductEntry{}]

  @type t :: %__MODULE__{image_assets: [ImageAsset.t()], product_entries: [ProductEntry.t()]}

  defimpl MapShaper.Target do
    def translate_source_map(_value, map) do
      {:ok, product_entries} = ExJSONPath.eval(map, "$.entries[?(@.sys.contentType.sys.id == 'product')]")
      {:ok, image_assets} = ExJSONPath.eval(map, "$.assets[?(@.sys.type == 'Asset')]")

      %{"image_assets" => image_assets, "product_entries" => product_entries}
    end
  end
end

and parsing + validation of the JSON binary to the struct as simple as:

with {:ok, map} <- Jason.decode(binary),
      catalog = MapShaper.from_map(%ProductCatalog{}, map, &maybe_remove_locale/1),
      {:ok, catalog} <- ProductCatalog.ensure_type_ok(catalog) do
  #... use valid catalog here...
end

IvanR · November 14, 2021, 6:05pm

Update of the previous post.

I’ve extracted the custom code to a Nestru library that serializes between JSON map and nested struct and supports keys renaming.

The updated example is here, the keys remapping and the following validation is like the following:

defmodule JsonReply.ProductCatalog do
  @moduledoc false

  use Domo

  alias JsonReply.ProductCatalog.{ImageAsset, ProductEntry}

  defstruct image_assets: [%ImageAsset{}],
            product_entries: [%ProductEntry{}]

  @type t :: %__MODULE__{
          image_assets: [ImageAsset.t()],
          product_entries: [ProductEntry.t()]
        }

  defimpl Nestru.PreDecoder do
    def gather_fields_map(_value, _context, map) do
      with {:ok, product_entries} <- JSONPath.get_list(map, "$.entries[?(@.sys.contentType.sys.id == 'product')]"),
           {:ok, image_assets} <- JSONPath.get_list(map, "$.assets[?(@.sys.type == 'Asset')]") do
        {:ok, %{image_assets: image_assets, product_entries: product_entries}}
      end
    end
  end

  defimpl Nestru.Decoder do
    def from_map_hint(_value, context, _map) do
      {:ok,
       %{
         image_assets: &Nestru.from_list_of_maps(&1, ImageAsset, context),
         product_entries: &Nestru.from_list_of_maps(&1, ProductEntry, context)
       }}
    end
  end
end

parsing + validation of the JSON binary to the struct:

with {:ok, map} <- Jason.decode(binary),
      {:ok, catalog} <- Nestru.from_map(map, ProductCatalog, locale: "en-US"),
      {:ok, catalog} <- ProductCatalog.ensure_type_ok(catalog) do
  #... use valid catalog here...
end