Ecto preload function discarding data

I’m trying to preload data into a many_to_many relationship using a custom query to get at a complex set of data. (See: https://hexdocs.pm/ecto/Ecto.Query.html#preload/3-preload-functions for the docs)
The preload function gets executed and the data returned is correct. (By inspecting the results of the reload function). I’ve included the parent id (franchise_id) mapped into a virtual field of the schema returned. But… it seems that the data is thrown away and not mapped into my Franchise model. It comes out with an empty array on the franchise.franchise_groups property.

Any ideas? What am I missing to get this data into the Franchise struct?
Thanks.

Code snippets:

defmodule Molly.FranchiseGroup do
  use Molly.Schema, :model


  schema "franchise_groups" do
    field :name, :string
    field :start_season, :integer
    field :end_season, :integer, default: 3000

    belongs_to :parent, Molly.FranchiseGroup
    belongs_to :league, Molly.League

    many_to_many :franchises, Molly.Franchise, join_through: Molly.FranchiseAffiliation 

    field :path, {:array, :integer}, virtual: true
    field :franchise_id, :integer, virtual: true

    timestamps()
  end
end

defmodule Molly.Franchise do
  use Molly.Schema, :model

  schema "franchises" do
    field :name, :string

    has_many :teams, Molly.Team

    # This is kind of a fake for preloading groups in a hierarchy
    many_to_many :franchise_groups, Molly.FranchiseGroup, join_through: Molly.FranchiseAffiliation 

    timestamps()
  end

end

defmodule FranchiseQuery do

    def preload_groups(query, season) do
      groups_preload = fn ids -> groups_query(ids, season) |> Repo.all() end

      from f in query,
        preload: [franchise_groups: ^groups_preload]
    end

    defp groups_query(ids, season) do
      from fg in FranchiseGroup,
        inner_join: cte in fragment("""
         ... -- Trust me, a complex recursive CTE query that returns the data I need
          """, ^ids), on: fg.id == cte.id,
        where: fg.start_season <= ^season and fg.end_season >= ^season,
        select: %{ fg | path: cte.path, franchise_id: cte.franchise_id}
    end
end
1 Like

The only reason I can think for this to happen is because none of the relationships are mapping out. A simple way to solve this is to replace the complex query by a very simple query one that only filters on the given IDs and see if it works. And then progressively make it more complex until it breaks.

A pretty simple query fails in the same way. Is it something about a virtual field being used?
This one returns a single item, but when the Franchise comes out it has an empty array in the franchise_groups.

    defp groups_query(ids, season) do
      from fg in Molly.FranchiseGroup,
        inner_join: fa in Molly.FranchiseAffiliation, on: fg.id == fa.franchise_group_id,
        where: fa.franchise_id in (^ids),
        where: fg.start_season <= ^season and fg.end_season >= ^season,
        select: %{ fg | franchise_id: fa.franchise_id}
    end

Some IO.inspects below: The first list is the results of the preload function executing, the Franchise schema below it has the empty array

[%Molly.FranchiseGroup{__meta__: #Ecto.Schema.Metadata<:loaded, "franchise_groups">,
  end_season: 3000, franchise_id: 7142,
  franchises: #Ecto.Association.NotLoaded<association :franchises is not loaded>,
  id: 132, inserted_at: #DateTime<2018-05-03 20:56:53.018847Z>,
  league: #Ecto.Association.NotLoaded<association :league is not loaded>,
  league_id: nil, name: "ukgEwzHq",
  parent: #Ecto.Association.NotLoaded<association :parent is not loaded>,
  parent_id: 131, path: nil, start_season: 2016,
  updated_at: #DateTime<2018-05-03 20:56:53.018852Z>}]
%Molly.Franchise{__meta__: #Ecto.Schema.Metadata<:loaded, "franchises">,
 franchise_groups: [], id: 7142,
 inserted_at: #DateTime<2018-05-03 20:56:53.008229Z>,
 league: #Ecto.Association.NotLoaded<association :league is not loaded>,
 league_id: 11979, name: "Fighting 0ers",
 teams: #Ecto.Association.NotLoaded<association :teams is not loaded>,
 updated_at: #DateTime<2018-05-03 20:56:53.008235Z>}

So this works:

    defp groups_query(ids, season) do
      from fg in Molly.FranchiseGroup,
        inner_join: fa in Molly.FranchiseAffiliation, on: fg.id == fa.franchise_group_id,
        inner_join: f in Molly.Franchise, on: fa.franchise_id == f.id,
        where: f.id in (^ids),
        where: fg.start_season <= ^season and fg.end_season >= ^season,
        select: %{ fg | id: fa.franchise_id} // <-- NOTICE THIS
    end

I’m overriding the ID of the FranchiseGroup with the Franchise id.
It’s trying to match the id of the child (FranchiseGroup) to the id of the parent (Franchise).

That doesn’t seem right at all. Is this something to do with a many_to_many?

You can see below the ids of the parent and the children are the same.

%Molly.Franchise{__meta__: #Ecto.Schema.Metadata<:loaded, "franchises">,
 franchise_groups: [%Molly.FranchiseGroup{__meta__: #Ecto.Schema.Metadata<:loaded, "franchise_groups">,
   end_season: 3000,
   franchises: #Ecto.Association.NotLoaded<association :franchises is not loaded>,
   id: 7149, inserted_at: #DateTime<2018-05-03 21:11:22.965819Z>,
   league: #Ecto.Association.NotLoaded<association :league is not loaded>,
   league_id: nil, name: "zu3ntSa1",
   parent: #Ecto.Association.NotLoaded<association :parent is not loaded>,
   parent_id: nil, path: [145], start_season: 2016,
   updated_at: #DateTime<2018-05-03 21:11:22.965823Z>},
  %Molly.FranchiseGroup{__meta__: #Ecto.Schema.Metadata<:loaded, "franchise_groups">,
   end_season: 3000,
   franchises: #Ecto.Association.NotLoaded<association :franchises is not loaded>,
   id: 7149, inserted_at: #DateTime<2018-05-03 21:11:22.966917Z>,
   league: #Ecto.Association.NotLoaded<association :league is not loaded>,
   league_id: nil, name: "wuu7sc4i",
   parent: #Ecto.Association.NotLoaded<association :parent is not loaded>,
   parent_id: 145, path: [145, 146], start_season: 2016,
   updated_at: #DateTime<2018-05-03 21:11:22.96692Z>}], id: 7149,
 inserted_at: #DateTime<2018-05-03 21:11:22.959107Z>,
 league: #Ecto.Association.NotLoaded<association :league is not loaded>,
 league_id: 11986, name: "Fighting 0ers",
 teams: #Ecto.Association.NotLoaded<association :teams is not loaded>,
 updated_at: #DateTime<2018-05-03 21:11:22.959112Z>}

If I change the association to an invalid has_many, then it works:

defmodule Molly.Franchise do
  use Molly.Schema, :model

  schema "franchises" do
    field :name, :string
    belongs_to :league, Molly.League

    has_many :teams, Molly.Team

    # This is kind of a fake for preloading groups in a hierarchy
    # many_to_many :franchise_groups, Molly.FranchiseGroup, join_through: Molly.FranchiseAffiliation, join_keys: [franchise_id: :id, franchise_group_id: :id]
    has_many :franchise_groups, Molly.FranchiseGroup

    timestamps()
  end
end

Of course I get a warning: "warning: invalid association franchise_groups in schema Molly.Franchise: associated schema Molly.FranchiseGroup does not have field franchise_id"

As far as I can tell:

has_many through passes a list of child ids to the preload function and will exclude anything returned that doesn’t have an id that matches one of those ids. (Only allows a use case where you want the exact same items that would be returned from the main query). This is unexpected because it’s a different set of ids compared to the other cases. (Feels like a bug)

many_to_many passes the parent ids to the preload function and will exclude anything returned that doesn’t have an id that matches those ids. (Only would work accidentally if the parent and child had the same ids) (Feels like a bug)

has_many passes the parent ids to the preload function and will work as expected matching the parent id to the “parent_id” property on the child items. (Works as expected)

Yup, it does look like an Ecto bug. For many_to_many we are comparing the association owner id with the associated id. We need a way to instruct Ecto to look at something else but I am not quite sure how to do so. Can you please open up an issue and link to this thread in there? Thank you!

1 Like

Will do. I created a simple test app as well. Thanks for the help.

Done, with example app and reference to this discussion.

1 Like

I am getting a similar problem but I was using preload query instead of preload function. Basically my data got discarded even though when I copied the query from the debug log the objects are actually returned by the database.

Naturally my question is, is this being fixed already? I saw on Ecto 3 changelog

  • [Ecto.Repo] Allow many_to_many associations to be preloaded via a function (before the behaviour was erratic)

Is that the fix for this issue? And is that fix includes investigations on preload query?