SQL will always return tabular data, no matter how hard one tries to think of it as nested or treelike.
So given a row with book details you add columns for person data using a join. But you only get enough columns for a single joined person. If you want to join another person you need to do another join, which will one more set of columns to the resulting data.
That’s why your preload doesn’t work as you expect. You’re joining either an author or an illustator and then telling ecto to use that single set of data for both relations.
If you want to join another person you need to do another join, which will one more set of columns to the resulting data.
Not sure about that. I know my approach isn’t right but I’ll leave some notes for other readers who encounter a similar situation:
This query gives me two rows per book, one with the joined author and one with the joined illustrator and only one set of Person columns.
SELECT b.*, p.*
FROM books b
LEFT JOIN people p ON p.id = b.author_id OR p.id = b.illustrator_id;
b.id
b.title
b.type
b.author_id
b.illustrator_id
p.id
p.name
1
War and Art
hardcover
1
2
1
Leo Wolfe
1
War and Art
hardcover
1
2
2
Maria Chen
2
Ocean Tales
paperback
2
4
2
Maria Chen
2
Ocean Tales
paperback
2
4
4
David Singh
All the data is right there. We don’t get a guaranteed order for the joined rows but can still match the IDs from the result set.
This kind of processing is similar to Ecto preloads with separate queries (no join), but without using a separate SELECT * FROM person WHERE person.id IN (1, 2, 3, ...) query.
This is a tradeoff between:
the cost of duplicate data transferred for Book entities
the cost of looking up a bunch of specific IDs (Ecto preload w/o join, using separate pass)
the cost of joining the same table twice
In my case, it turned out that the query above (one join) is less efficient than two joins. This means that here, the standard Ecto preload with two joins is ideal:
from(b in Book,
join: a in assoc(b, :author),
join: i in assoc(b, :illustrator),
preload: [author: a, illustrator: i]
)
|> Repo.all()
Your mileage may vary! EXPLAIN ANALYZE is your friend.
While this is generally true I doubt there’s many cases if at all where joining distinct data as separate rows over joining as separate columns will yield a performance boost. That’s something I’d only start touching as one of the later options once a query in question was proven to be a hotspot. Foreign keys on primary keys of other tables are hella fast, because the primary key is the the primary way to look up information on that table. It’s afaik always indexed and basically the first thing any db engine will make sure is fast.