Ecto: to join/select or to preload?

travisf · January 1, 2024, 8:21pm

From a performance perspective, is there any difference between:

Order
|> preload(user: [:customer])

and

Order 
|> join(:left, [o], u in assoc(o, :user))
|> join(:left, [o, u], c in assoc(u, :customer))
|> select([o, u, c], %{id: o.id, user: u.id, customer: c.id})

And if so how big a difference is there?

I like #2 because it makes for cleaner API and you don’t have to pass massive assigns around but I’m curious if the performance difference would be not negligible.

shanesveller · January 1, 2024, 8:36pm

This is not apples to apples because the latter gives you database primary keys while the former gives you hydrated structs with nested lists of associations. It will consequently take more memory and more network traffic from the DB, but you can directly work with their fields, generate changesets, etc.

Comparing the two approaches according to performance is perhaps not very helpful because your downstream usecase would inform whether you need that rich data or not. If you’re not merely using the PKs as-is, perhaps to CRUD a join table row or to populate the arguments of an Oban job, you probably would still need to fully load those rows at some point. Doing so via Ecto.Query.preload is not unambiguously faster or slower than another technique, it’s pretty situational and also slightly sensitive to the cardinality of the joins. Doing fewer round trips to the database may even be slower at times if it requires more complex query runtime.

D4no0 · January 1, 2024, 8:55pm

You can actually execute the query and analyze the logs for the generated query. Generally speaking there is no performance difference, the preload one is using associations as the base abstraction and of course abstracts and limits the way you do the join.

I rarely use preloads in a query, as I think that this construct is foreign to the query. You also lose the ability to do nice things like this:

Order 
|> join(:left, [o], u in assoc(o, :user), as: :user)
|> join(:left, [user: u], c in assoc(u, :customer), as: :customer)
|> select([o, user: u, customer: c], %{id: o.id, user: u.id, customer: c.id})

This feature is absolutely important when you have big queries, as passing a list and remember the ordering is not safe or readable.

As for preloads in general, I think preloads and associations are a good abstraction, I use Repo.preload/3 for fetching associations and custom queries with joins when I either need to optimize those joins or filter them.

shanesveller · January 1, 2024, 9:06pm

Most non-trivial queries ought to use named bindings anyhow, which are order-insensitive and much more resilient for composition. Using those is entirely orthogonal to whether you’ve used Query preload or Repo preload IME.