Performance implications of Ecto within-query preloads with postgres

marick · July 9, 2020, 8:40pm

I know not to worry about performance before getting evidence of a problem, but I’m idly curious about preloads. (Note: I have fairly little experience with databases.)

I have this query:

#Ecto.Query<from a0 in Crit.Schemas.Animal, where: a0.id == ^"2",
 order_by: [asc: a0.name], distinct: [asc: a0.name],
 preload: [[:species, :service_gaps]]>

When that runs, the log shows three queries:

[debug] QUERY OK source="animals" db=0.5ms idle=1437.9ms
SELECT ... FROM "demo"."animals" WHERE (a0."id" = $1)  [2]

[debug] QUERY OK source="species" db=1.0ms idle=1438.7ms
SELECT ... FROM "demo"."species" ... WHERE (s0."id" = $1) [1]

[debug] QUERY OK source="service_gaps" db=1.5ms idle=1438.7ms
SELECT ... FROM "demo"."service_gaps" WHERE (s0."animal_id" = $1) [2]

Am I correct in thinking that’s three round trips to Postgres?

benwilson512 · July 9, 2020, 9:21pm

It does it in 3 distinct queries yes, although notably if you aren’t in a transaction it will query the preloads in parallel.

mindok · July 10, 2020, 12:26am

Also, the main point is that you avoid the n+1 problem which can be a significant source of unintended database traffic.

josevalim · July 10, 2020, 6:52am

Correct. Keep in mind that this is often the best way to load one to many associations because of the way SQL database return data. If you have three tables, all associated in one to many, and each table returns respectively K, M, N entries, you will get overall K * M * N rows. That ends up with a lot of duplication, which translates to more data over the wire and more work decoding the data. When using preloads, you get K + M + N rows, which ends up being more performant.

If you want to force to actually go the join route, you can do so too:

from q in query, join: s in assoc(q, :species), join: sg in assoc(q, :service_gaps), preload: [species: s, service_gaps: sg]