Virtual fields and many-to-many associations in Ecto

tao · June 29, 2019, 4:54pm

I’m setting up my schemas and migrations in a Phoenix app. I’m trying to create a many-to-many relation from a table called instances to itself, but I’m getting an error – I think this error shows that I don’t quite understand many-to-many associations in Ecto. Maybe someone here can help?

This is the schema in question:

defmodule Backend.Instance do
  use Ecto.Schema
  import Ecto.Changeset

  schema "instances" do
    field :domain, :string

    many_to_many :peers, Backend.Instance,
      join_through: Backend.InstancePeer,
      join_keys: [source_domain: :domain, target_domain: :domain]

    has_many :instance_peers, Backend.InstancePeer, foreign_key: :source_domain

    timestamps()
  end
end

defmodule Backend.InstancePeer do
  use Ecto.Schema
  import Ecto.Changeset

  schema "instance_peers" do
    belongs_to :source_domain, Backend.Instance, references: :domain, type: :string
    belongs_to :target_domain, Backend.Instance, references: :domain, type: :string
    timestamps()
  end

end

In the instances schema, the instance_peers field is there because I want to bulk insert associations without necessarily knowing the IDs of the instances – only the domain fields.

My migrations look like so:

def change do
    create table(:instances) do
      add :domain, :string, null: false

      timestamps()
    end

    create unique_index(:instances, [:domain])

    create table(:instance_peers) do
      add :source_domain, references(:instances, column: :domain, type: :string)
      add :target_domain, references(:instances, column: :domain, type: :string)

      timestamps()
    end

    create unique_index(:instance_peers, [:source_domain, :target_domain])
end

Now, I am trying to bulk-insert a list of instance_peer. peers below is a list of strings.

instance_peers =
  Enum.map(
    peers,
    &%InstancePeer{
      source_domain: curr_domain,
      target_domain: &1,
      inserted_at: now,
      updated_at: now
    }
  )

instance
|> Repo.preload(:instance_peers)
|> Ecto.Changeset.change()
|> Ecto.Changeset.put_assoc(:instance_peers, instance_peers)
|> Repo.update!()

I am getting an error that I do not understand. The call to Repo.preload/2 throws the following error:

** (Ecto.QueryError) /home/backend/deps/ecto/lib/ecto/association.ex:622: field `source_domain` in `where` is a virtual field in schema Backend.InstancePeer in query:

from i0 in Backend.InstancePeer,
  where: i0.source_domain == ^1,
  order_by: [asc: i0.source_domain],
  select: {i0.source_domain, i0}

    (elixir) lib/enum.ex:1940: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir) lib/enum.ex:1431: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (elixir) lib/enum.ex:1940: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ecto) lib/ecto/repo/queryable.ex:138: Ecto.Repo.Queryable.execute/4
    (ecto) lib/ecto/repo/queryable.ex:18: Ecto.Repo.Queryable.all/3
    (elixir) lib/enum.ex:1327: Enum."-map/2-lists^map/1-0-"/2

But source_domain is not a virtual field! As far as I understand it’s a real field in the database as defined in the migration. Have I misunderstood what virtual fields are?

idi527 · June 29, 2019, 4:57pm

You probably need to use source_domain_id instead of source_domain or whatever ecto generated there.

Can you show what keys are in the InstancePeer struct? Map.from_struct(%InstancePeer{}). There might be source_domain_id or source_domain_domain, that’s what actually gets read and written to the database, source_domain then get’s optionally preloaded, so it does act a bit like a virtual field, since it doesn’t really map to a column in the database table.

Also, check out insert_all, it would probably be a much simpler way to achieve what you want.

# get current peers
current_peers =
  "instance_peers"
  |> where(source_domain: ^curr_domain)
  |> select([p], p.target_domain)
  |> Repo.all()

wanted_peers_set = MapSet.new(peers)
current_peers_set = MapSet.new(current_peers)

# delete the peers we don't want
dont_want = current_peers_set |> MapSet.difference(wanted_peers_set) |> MapSet.to_list()

"instance_peers"
|> where(source_domain: ^curr_domain)
|> where([p], p.target_domain in ^dont_want)
|> Repo.delete_all([])

# insert the ones we don't have yet
new_instance_peers =
  wanted_peers_set
  |> MapSet.difference(current_peers_set)
  |> MapSet.to_list()
  |> Enum.map(
    &%{
      source_domain: curr_domain,
      target_domain: &1,
      inserted_at: now,
      updated_at: now
    }
  )

Repo.insert_all("instance_peers", new_instance_peers)

# the above should probably be done in a transaction

It can also probably be reduced to two sql queries (one delete with join or subquery, one insert).

idi527 · June 29, 2019, 5:16pm

Also, you probably need to call your fields :source and :domain in the schema blocks so that ecto wouldn’t get confused.

defmodule Backend.InstancePeer do
  use Ecto.Schema
  import Ecto.Changeset

  schema "instance_peers" do
    belongs_to :source, Backend.Instance, references: :domain, type: :string, foreign_key: :source_domain
    belongs_to :target, Backend.Instance, references: :domain, type: :string, foreign_key: :target_domain
    timestamps()
  end

end

tao · June 30, 2019, 9:40pm

This is very helpful – thank you! You’re right – the InstancePeer struct did have a source_domain_id field, but after following your suggestion to rename fields it’s a lot clearer how things work.

Your insert_all code is very helpful. I had to make sure that all the sets are sets of strings rather than structs, but it works great and is a lot more performant!

Thanks for taking the time to explain. Coming from Pythong and Django, Ecto feels a bit more like driving in manual, but also a lot more powerful once you wrap your head around it!