How to properly filter/ modify preload queries in ecto? (simple preload, join-preload or other)

Hello everyone, I was working on a project and was faced with this problem and I’d like some input from you on the subject. Imagine this scenario:

  • Posts can be published and reviewed.
  • A Post has many Authors associated with it and vice-versa.
  • Authors can review Posts (identified by the reviewer column in the association).
  • Post(s) can be listed with its authors which must have the virtual reviewer flag filled.

schema "posts" do
  field :title, :string
  field :body, :string
  many_to_many :authors, Author, join_through: AuthorPost, on_replace: :delete)
end
schema "authors" do
  field :name, :string
  field :email, :string
  many_to_many :posts, Post, join_through: AuthorPost, on_replace: :delete)
  field :reviewer, virtual: true
end
 @primary_key false
  schema "authors_posts" do
    field :author_id, :integer, primary_key: true
    field :post_id, :integer, primary_key: true
    field :reviewer, :boolean
  end

The way I see there are three ways that we could approach this at first:

  1. Use a preload-join to load authors associated with posts in a single query. This way I can’t fill the reviewer field. Also, I’m aware that it may not be the best option to do a preload-join for has_many and many_to_many associations as explained here:
Repo.all(
  from p in Post, 
  join: a in assoc(p, :authors), 
  preload: [authors: a]
)

  1. Use a “simple” preload without a join, to load authors associated with posts issuing two queries. This suffers the same problem as the first option because I can’t customize how it’s being fetched from the database. It seems though that the query Ecto outputs for this is a little bit more optimized than whatever I’m doing right now:
Repo.all(
  from p in Post,
  preload: [:authors]
)
SELECT a0.*, p1."id" 
FROM "authors" AS a0 
INNER JOIN "posts" AS p1 
ON p1."id" = ANY($1) 
INNER JOIN "authors_posts" AS a2 
ON a2."post_id" = p1."id" 
WHERE (a2."author_id" = a0."id") 
ORDER BY p1."id" [[1, 2, 3, 4, 5, 6, 7, 8, 9]]

  1. Use a custom preload query, to load authors associated with posts issuing two queries. This allows me to better configure how I want to return the Authors, but the query generated seems a little bit too complex if you consider the second option:
query = 
  from a in Author,
  join: ap in AuthorPost,
  on: a.id == ap.author_id,
  select_merge: %{reviewer: ap.reviewer}

Repo.all(
  from p in Post,
  preload: [authors: ^query]
)
SELECT a0.*, p2."id" 
FROM "authors" AS a0 
INNER JOIN "authors_posts" AS a1 
ON a0."id" = a1."author_id" 
INNER JOIN "posts" AS p2 
ON p2."id" = ANY($1) 
INNER JOIN "authors_posts" AS a3 
ON a3."post_id" = p2."id" 
WHERE (a3."author_id" = a0."id") 
ORDER BY p2."id" [[1, 2, 3, 4, 5, 6, 7, 8, 9]]

So, how would you approach doing this, considering that you must retrieve the records through a Post and not loading the associations directly from the database? I figured that this might be a very common use case, in the end, I went with this last option, but I’d like to explore how to better use Ecto for this purpose.

Some things that I’ve learned fiddling with Ecto SQL this week:

  • It seems that Ecto does not support preloading from a subquery (bummer)
  • It seems that Ecto does not let you customize your join queries (only where's are aloud)
  • Sometimes it may be better to do two queries to preload instead of a preload-join (not so clear in my mind yet tbh)
3 Likes

Hey @thiagomajesk any news about it? I have the same problem here.

Update: Regarding my remarks about option 3: I ended up using Ecto’s preload functions to get a more optimized version of the query I needed (more optimized considering what Ecto was generating by default at least).

3 Likes

See my last comment/update. I’m still waiting to see if anyone wants to wage on this though.