Consider an ecto query to get a single podcast which has many comments. Each comment has an author. I have a query to preload them:
def get_podcast!(id) do
Repo.get!(Podcast, id)
|> Repo.preload([comments: [:author]])
end
But I want to order the comments chronologically, so I can display them chronologically (as one would expect). One option would be to re-write the query with this syntax:
def get_podcast!(id) do
Repo.one from p in Podcast,
where: p.id == ^id,
join: c in assoc(p, :comments),
order_by: [desc: c.inserted_at],
join: a in assoc(c, :author),
preload: [comments: {c, author: a}]
end
Another option is this way:
def get_podcast!(id) do
Repo.get!(Podcast, id)
|> Repo.preload([comments: (from c in Comment, order_by: [desc: c.inserted_at])])
|> Repo.preload([comments: [:author]])
end
I like this last way a bit more, but I’m wondering if it has a performance cost. Also, is there more compact syntax that’s still clear but doesn’t require calling preload twice?
As shown in the last example in the docs you can do:
# Use a two-element tuple for a custom query and nested association definition
query = from c in Comment, order_by: c.published_at
posts = Repo.preload posts, [comments: {query, [:replies, :likes]}]
which in your case you can use to only do a single Repo.preload/2
call.
As for performance, I am not too sure. However (if I recall correctly) when you use Repo.preload/2
Ecto will first query the associated comments and then reduce a list of the associated author_id
which it will then use to query the authors table. Such that the resulting query to the authors table is something like SELECT ... FROM authors WHERE id IN [...]
.
Which means that even if two comments share a same author (or a same author_id
rather) it is only selected exactly once. Whereas if you do a JOIN, you might be reaching to the authors table multiple times for a same author.
Also the amount of data to be transfered differs in the same way. Since for a JOIN you might be receiving the data for a same author on different rows - one per comment sharing a same author. Whereas by preloading you only do a couple selects for the data separately and then Elixir, or Ecto in this case, “joins” the data manually for you.
So I believe it’s a matter of data transfer + computation on the database side vs. having to traverse data in Elixir + separate queries.
1 Like
That’s exactly what I was looking for… and somehow not seeing as I read that page for 20m before posting here 
Thanks.
One other note of interest - the Repo.preload
version will execute queries in parallel (using Task.async_stream
) if there are multiple associations to preload:
Repo.get(Podcast, id)
|> Repo.preload([:images, comments: :author])
Here images
and comments
will be preloaded in parallel.
4 Likes