Let’s say I write an ecto query that fetches 1000 posts and preload the author field. Lets say the 1000 posts are all written by 2 authors. How many author structs are created? It is just the 2? Or is it one for each post? How many times is the author data returned from the db?
To get specific, lets say we write the query like this:
from(p in Post, where: p.id in ^post_ids, preload: :author)
Is it more efficient to join the author first?
p in Post,
where: p.id in ^post_ids,
join: a in Author, on: a.id == p.author_id,
preload: author: a
Or maybe to dedup the authors we need to group by author? I’ll probably end up testing all this, I just wanted to see what people thought.
Preface: I’m assuming that
Post has a
belongs_to relationship to
preload: :author form is going to do a three-step process:
- load the
- collect all the
author_ids from all the
- load those with one query like
id IN (...) and fill in the
author fields on all the
So there’s one
Author struct per distinct author in the results for this approach.
Post’s fields appear on the wire once, as do each
In the explicit join case, things go differently:
- the query returns all the post and author columns together.
- Ecto builds
Post structs from the results and caches
Authors as they are seen (IIRC)
There’s also only one
Author struct per distinct author for this approach. See this thread for an example where that wasn’t what the poster wanted
Posts fields appear on the wire once, but each
Author’s fields appear once per
Post they are the author of.
In general, prefer the
preload: :author form unless there’s a specific reason (filtering, etc) to work harder.
If you join the author then you’ll get 1000 author records transfered from the db, which are deduplicated within elixir to two author structs. When using preloads only the two authors will be transfered and loaded.