Ecto preload bandwith/memory efficiency?

Let’s say I write an ecto query that fetches 1000 posts and preload the author field. Lets say the 1000 posts are all written by 2 authors. How many author structs are created? It is just the 2? Or is it one for each post? How many times is the author data returned from the db?

To get specific, lets say we write the query like this:

from(p in Post, where: in ^post_ids, preload: :author)

Is it more efficient to join the author first?

p in Post,
where: in ^post_ids, 
join: a in Author, on: == p.author_id,
preload: author: a

Or maybe to dedup the authors we need to group by author? I’ll probably end up testing all this, I just wanted to see what people thought.

1 Like

Preface: I’m assuming that Post has a belongs_to relationship to Author

The preload: :author form is going to do a three-step process:

  • load the Post structs
  • collect all the author_ids from all the Posts
  • load those with one query like id IN (...) and fill in the author fields on all the Posts

So there’s one Author struct per distinct author in the results for this approach.

Each Post’s fields appear on the wire once, as do each Author’s.

In the explicit join case, things go differently:

  • the query returns all the post and author columns together.
  • Ecto builds Post structs from the results and caches Authors as they are seen (IIRC)

There’s also only one Author struct per distinct author for this approach. See this thread for an example where that wasn’t what the poster wanted

Each Posts fields appear on the wire once, but each Author’s fields appear once per Post they are the author of.

In general, prefer the preload: :author form unless there’s a specific reason (filtering, etc) to work harder.

If you join the author then you’ll get 1000 author records transfered from the db, which are deduplicated within elixir to two author structs. When using preloads only the two authors will be transfered and loaded.