Let’s say I write an ecto query that fetches 1000 posts and preload the author field. Lets say the 1000 posts are all written by 2 authors. How many author structs are created? It is just the 2? Or is it one for each post? How many times is the author data returned from the db?
To get specific, lets say we write the query like this:
from(p in Post, where: p.id in ^post_ids, preload: :author)
Is it more efficient to join the author first?
from(
p in Post,
where: p.id in ^post_ids,
join: a in Author, on: a.id == p.author_id,
preload: author: a
)
Or maybe to dedup the authors we need to group by author? I’ll probably end up testing all this, I just wanted to see what people thought.
1 Like
Preface: I’m assuming that Post
has a belongs_to
relationship to Author
The preload: :author
form is going to do a three-step process:
- load the
Post
structs
- collect all the
author_id
s from all the Post
s
- load those with one query like
id IN (...)
and fill in the author
fields on all the Post
s
So there’s one Author
struct per distinct author in the results for this approach.
Each Post
’s fields appear on the wire once, as do each Author
’s.
In the explicit join case, things go differently:
- the query returns all the post and author columns together.
- Ecto builds
Post
structs from the results and caches Author
s as they are seen (IIRC)
There’s also only one Author
struct per distinct author for this approach. See this thread for an example where that wasn’t what the poster wanted
Each Post
s fields appear on the wire once, but each Author
’s fields appear once per Post
they are the author of.
In general, prefer the preload: :author
form unless there’s a specific reason (filtering, etc) to work harder.
If you join the author then you’ll get 1000 author records transfered from the db, which are deduplicated within elixir to two author structs. When using preloads only the two authors will be transfered and loaded.
2 Likes