Is there a significant performance cost to preloading one association at a time?

Consider an ecto query to get a single podcast which has many comments. Each comment has an author. I have a query to preload them:

def get_podcast!(id) do
  Repo.get!(Podcast, id)
  |> Repo.preload([comments: [:author]])

But I want to order the comments chronologically, so I can display them chronologically (as one would expect). One option would be to re-write the query with this syntax:

  def get_podcast!(id) do from p in Podcast,
      where: == ^id,
      join: c in assoc(p, :comments),
      order_by: [desc: c.inserted_at],
      join: a in assoc(c, :author),
      preload: [comments: {c, author: a}]

Another option is this way:

def get_podcast!(id) do
  Repo.get!(Podcast, id)
  |> Repo.preload([comments: (from c in Comment, order_by: [desc: c.inserted_at])])
  |> Repo.preload([comments: [:author]])

I like this last way a bit more, but I’m wondering if it has a performance cost. Also, is there more compact syntax that’s still clear but doesn’t require calling preload twice?

As shown in the last example in the docs you can do:

# Use a two-element tuple for a custom query and nested association definition
query = from c in Comment, order_by: c.published_at
posts = Repo.preload posts, [comments: {query, [:replies, :likes]}]

which in your case you can use to only do a single Repo.preload/2 call.

As for performance, I am not too sure. However (if I recall correctly) when you use Repo.preload/2 Ecto will first query the associated comments and then reduce a list of the associated author_id which it will then use to query the authors table. Such that the resulting query to the authors table is something like SELECT ... FROM authors WHERE id IN [...].

Which means that even if two comments share a same author (or a same author_id rather) it is only selected exactly once. Whereas if you do a JOIN, you might be reaching to the authors table multiple times for a same author.

Also the amount of data to be transfered differs in the same way. Since for a JOIN you might be receiving the data for a same author on different rows - one per comment sharing a same author. Whereas by preloading you only do a couple selects for the data separately and then Elixir, or Ecto in this case, “joins” the data manually for you.

So I believe it’s a matter of data transfer + computation on the database side vs. having to traverse data in Elixir + separate queries.

1 Like

That’s exactly what I was looking for… and somehow not seeing as I read that page for 20m before posting here :smiley:


One other note of interest - the Repo.preload version will execute queries in parallel (using Task.async_stream) if there are multiple associations to preload:

Repo.get(Podcast, id)
|> Repo.preload([:images, comments: :author])

Here images and comments will be preloaded in parallel.