Ecto 3: custom type within map/array not being loaded properly

(I’m using Ecto 3 {:ecto_sql, "~> 3.0"}, although I don’t think that’s related to the problem.)

I want to store a list of “notices” in my DB model to record issues with (e.g.) unclean data. You can think of it like Ecto changeset errors: a tag (e.g. attribute name) associated to an array of these notices.

I’m trying to use a custom type for this, but when the custom type is nested within other native DB types, it doesn’t appear to get put into the expected struct when loaded from the DB.

Here’s my type definition:

defmodule TypeDemo.Notice do
  @behaviour Ecto.Type

  @enforce_keys [:message, :values]
  @derive {Jason.Encoder, only: [:message, :values]}
  defstruct [:message, :values]

  def new(message, %{} = values \\ %{}) when is_binary(message) do
    %__MODULE__{
      message: message,
      values: values
    }
  end

  @impl Ecto.Type
  def type, do: :map

  @impl Ecto.Type
  def cast(%{"message" => message, "values" => values})
      when is_binary(message) and is_map(values) do
    {:ok, struct!(__MODULE__, %{message: message, values: values})}
  end

  def cast(%__MODULE__{} = notice), do: {:ok, notice}

  def cast(_), do: :error

  @impl Ecto.Type
  def load(data) when is_map(data) do
    data =
      for {key, val} <- data do
        {String.to_existing_atom(key), val}
      end
      |> Enum.into(%{})

    values =
      for {key, val} <- data.values do
        {String.to_existing_atom(key), val}
      end
      |> Enum.into(%{})

    {:ok, struct!(__MODULE__, %{data | values: values})}
  end

  @impl Ecto.Type
  def dump(%__MODULE__{} = notice), do: Ecto.Type.dump(:map, notice)
  def dump(_), do: :error
end

Here’s the relevant migration:

defmodule TypeDemo.Repo.Migrations.AddPosts do
  use Ecto.Migration

  def change do
    create table("posts") do
      add(:name, :string, null: false)
      add(:notice, :map, null: false, default: %{})
      add(:issues, {:map, {:array, :map}}, null: false)
    end
  end
end

And here’s the “model”:

defmodule TypeDemo.Post do
  use Ecto.Schema

  import Ecto.Changeset

  alias TypeDemo.Notice

  schema "posts" do
    field(:name, :string)
    field(:notice, Notice)
    field(:issues, {:map, {:array, Notice}})
  end

  @doc false
  def changeset(changeset, attrs) do
    changeset
    |> cast(attrs, [
      :name,
      :notice,
      :issues
    ])
    |> validate_required([
      :name
    ])
  end
end

And here’s an IEx session:

import Ecto.{Changeset, Query}

alias TypeDemo.{Repo, Post, Notice}

notice = %Notice{message: "notice message", values: %{foo: :bar}}
base = change(%Post{}, %{name: "some name"})

c = TypeDemo.Post.changeset(base, %{issues: %{name: [notice, notice]}, notice: notice})

{:ok, post} = Repo.insert(c)

Which outputs:

08:50:52.007 [debug] QUERY OK db=19.7ms decode=2.4ms queue=0.9ms
INSERT INTO "posts" ("issues","name","notice") VALUES ($1,$2,$3) RETURNING "id" [%{name: [%TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}, %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}]}, "some name", %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}]
{:ok,
 %TypeDemo.Post{
   __meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
   id: 1,
   issues: %{
     name: [
       %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}},
       %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}
     ]
   },
   name: "some name",
   notice: %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}
 }}
iex(3)> post
%TypeDemo.Post{
  __meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
  id: 1,
  issues: %{
    name: [
      %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}},
      %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}
    ]
  },
  name: "some name",
  notice: %TypeDemo.Notice{message: "notice message", values: %{foo: :bar}}
}

As you can see, the various Notice instances have been properly cast. But here’s the problem: executing post = Repo.one(from p in Post, where: p.id == 1) yields:

%TypeDemo.Post{
  __meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
  id: 1,
  issues: %{
    "name" => [
      %{"message" => "notice message", "values" => %{"foo" => "bar"}},
      %{"message" => "notice message", "values" => %{"foo" => "bar"}}
    ]
  },
  name: "some name",
  notice: %TypeDemo.Notice{message: "notice message", values: %{foo: "bar"}}
}

While the notice attribute was properly loaded and converted into the Notice struct, the nested versions in issues were not.

So: am I missing something or doing something wrong? Is this expected behavior from Ecto?

Why are you using :map and :array? Pick just one.

Assuming you’re referring to add(:issues, {:map, {:array, :map}}, null: false) in the migration, it’s not map and array: it’s specifying a map type whose values are arrays of maps (i.e. each map key has a list of Notices as a the associated value). Or are you referring to something else?

I do not know if Ecto supports such nested types. This field will be just jsonb and will be treated as one ignoring {:array, type}, especially in migrations.

It does, but as you said it doesn’t change anything for a migration in postgres.

1 Like

I’d like to add that those are different on the database level - {:array, :map} would be a Postgres array and a jsonb with everything inside - just s jsonb. When querying it’s much less confusion with the later, querying arrays of jsonb objects is a pain.

Oh, and by the way. There you do not need custom type, you can use nested schema.

You make a good point about the data type in the migration (it should indeed be simplified to just be :map (i.e. :jsonb).

In my understanding, embedded schemas are essentially “normal” schemas without a DB table backing them. In other words, they require a relationship to the persisted schema, which is absent in this case: the Notices are within the map’s values (themselves being arrays). Would you by any chance have a link to somewhere I can read up on what you’re proposing?

In addition, I’d like to be able to have “issues” with notices on multiple different models.

The context for this is an ETL scenario: the data MUST be saved (so no usual Ecto validations), but I want to persist information about potential data quality issues so they can be reviewed/fixed.

For clarity, what I’m trying to achieve is to get something like this after loading from the DB:

%TypeDemo.Post{
  __meta__: #Ecto.Schema.Metadata<:loaded, "posts">,
  id: 1,
  issues: %{
    "name" => [
      %TypeDemo.Notice{message: "has some problem", values: %{foo: "bar"}},
      %TypeDemo.Notice{message: "has another problem too", values: %{flim: "flam"}}
    ],
    "language" => [
       %TypeDemo.Notice{message: "unknown language", values: %{}
    ]
  },
  name: "some name"
}

Since I’d want the Notice values to have a consistent structure, I wanted to convert the relevant data into Notice structs when a record is loaded. But perhaps this isn’t possible? Maybe I need to just store the raw map in jsonb, then Enum.map/2 them into structs when I intend to use them (although that sounds a bit annoying…)?

I’m having trouble understanding what you mean. Would you be able to give an example, or link to some docs/articles on it?

Postgres database has an array type https://www.postgresql.org/docs/10/arrays.html and json types https://www.postgresql.org/docs/10/datatype-json.html

When you declare a map field (Elixir’s map is jsonb in the DB level) you can have everything inside, including arrays - these would be json arrays as part of your json data.

When you declare an array of maps ({:array, :map}) in your migration, you get Postgres array with jsonb elements.

When you query the data it’s much easier to do with the “all in jsonb field” version then with the array of jsonb elements one, that’s what I meant.

Elixir docs briefly mention the distinction here https://hexdocs.pm/ecto/Ecto.Schema.html#embeds_many/3

It is recommended to declare your embeds_many/3 field with type :map in your migrations, instead of using {:array, :map} . Ecto can work with both maps and arrays as the container for embeds (and in most databases map are represented as JSON which allows Ecto to choose what works best).

For a bit more context: https://thoughtbot.com/blog/why-ecto-s-way-of-storing-embedded-lists-of-maps-makes-querying-hard

1 Like