jmurphyweb
Advantages of Ecto preloads with/without joins
I have a slow query as my data has grown. I wanted to test joining before preloads to see if i could speed it up.
I did a test on some mock data, first with the join and the second without. It seems like the speed gain is minimal.
Are there any other benefits to joining first before preloads?
Is it more efficient for the databases memory usage if we join before preloading?
If I did some additional nested preloads would that start to affect things?
With join
iex(1)> query = from tc in Timecard, join: tc_d in Timecards.TimecardData, on: tc.id == tc_d.timecard_id, preload: [timecard_data: tc_d]
#Ecto.Query<from t0 in Ev2.Timecards.Timecard,
join: t1 in Ev2.Timecards.TimecardData, on: t0.id == t1.timecard_id,
preload: [timecard_data: t1]>
Has the following output:
iex(2)> Repo.all(query)
[debug] QUERY OK source="timecards" db=1242.5ms decode=760.5ms
SELECT t0."id", t0."date", ..., t0."offer_id", t0."timecard_setting_id", t0."inserted_at", t0."updated_at", t1."id", t1."approver_id", ... , t1."updated_at" FROM "timecards" AS t0 INNER JOIN "timecard_data" AS t1 ON t0."id" = t1."timecard_id" []
Which looks like 1243ms.
Without join
Whereas the following with no joins and super simple syntax:
iex(3)> q2 = from tc in Timecard, preload: :timecard_data
#Ecto.Query<from t in Ev2.Timecards.Timecard, preload: [:timecard_data]>
Has the following output:
iex(4)> Repo.all(q2)
[debug] QUERY OK source="timecards" db=321.0ms decode=116.9ms
SELECT t0."id", t0."date", t0."week_ending_date", t0."datetime_submitted", t0."auto_submitted?", t0."auto_generated?", t0."employee_id", t0."proxy_creator_id", t0."offer_id", t0."timecard_setting_id", t0."inserted_at", t0."updated_at" FROM "timecards" AS t0 []
[debug] QUERY OK source="timecard_data" db=965.8ms decode=322.6ms
SELECT t0."id", t0."approver_id", ... , t0."timecard_id" FROM "timecard_data" AS t0 WHERE (t0."timecard_id" = ANY($1)) ORDER BY t0."timecard_id" [[15716, 15693, ...]]
Which to me looks like a total of 321 + 965 = 1286ms
Thanks for any input!
Most Liked
hubertlepicki
I think the issue with join-preloads can arise when you are attempting to preload has_many or many-to-many relationships. For these, the resulting SQL will output the same column values multiple times. If these are large text/blobs, the amount of data to be transferred over the wire quickly grows. Ecto deduplicates that on it’s end when returing data to the user so it is largely invisible, but joins behind the scenes still create this huge result that has to be moved from DB to Elixir in first place.
So, I think it makes total sense to preload something like belongs_to with joins, and possibly associations that just have a few related (small) records could be beneficial but for has_many associations probably you generally don’t want to do join-preloads.
check out this post: Composing Ecto queries: filters and preloads | AmberBit Sp. z o. o. and discussion I had with @michalmuskala on twitter: https://twitter.com/michalmuskala/status/1118120234991403009
hubertlepicki
This operation is very likely to be faster, especially as the number of records grow larger.








