I have 4 schemas: A, B, C, D. These are nested within each other in the following way:
A has many B
B has one C
C has many D
For each A, I want the following preload conditions:
preload all B's belonging to A
preload the C that belongs to each B
preload only one of the D's that belong to each C, and have this filter based on the value of a field in B
My data is very large and I’d like to do this in separate preload queries as opposed to a single join. My main difficulty is referencing a value from B inside the preload for D. Is there a way to do this?
You can reference B inside the preload from D using a preload query but, short of denormalizing your data, I’m not sure how it could be done without a join between at least B, C, and D.
@baldwindavid I’m running into some unexpected behaviour. I wonder if you can spot something I’m doing wrong.
Here is my query:
preload_query =
from(b in B,
left_join: c in C,
on: b.c_id == c.id,
left_join: d in D,
on: c.id == d.c_id and b.special_col == d.special_col,
preload: [c: {c, d: d}]
)
from(a in A, preload: [b: ^preload_query])
If there is only one b in the preload query then everything works normally, i.e. I only get the d where b.special_col = d.special_col. I’d get something like this:
However, if there are multiple b's in the preload query, then an aggregated list of all the d's is populated for each b instead of splitting them up. It looks like it essentially ignores b.special_col = d.special_col in this case. I’d get something like this:
Here is the preload query’s sql. When I put this into postgres manually, the output looks good to me.
SELECT
b0."id", b0."c_id", b0."special_col",
c1."id",
d2."id", d2."c_id", d2."special_col",
b0."id"
FROM "B" AS b0
LEFT OUTER JOIN "C" AS c1 ON b0."c_id" = c1."id"
LEFT OUTER JOIN "D" AS d2 ON (c1."id" = d2."c_id") AND (d2."special_col" = b0."special_col")
WHERE (b0."id" = ANY($1)) [[1, 2, 3]]
There is one strange thing, b0."id" is selected twice but I don’t know if this is related to my problem. That’s the field being used in the association with table A.
It looks like my issue is happening when Ecto sorts the associations after getting the query result from the db. I’m not sure if there’s anything I can do to modify this behaviour.
Let’s say hypothetically that each Student can only enroll in one Course
Now my Ecto query is:
preload_query =
from(student in Student,
left_join: college in College,
on: student.college_id == college.id,
left_join: course in Course,
on: college.id == course.college_id and student.course_name == course.name,
preload: [colleges: {college, courses: course}]
)
from(state in State, preload: [students: ^preload_query])
SQL for preload query:
SELECT
students."id", students."college_id", students."course_name",
colleges."id",
courses."id", courses."college_id", courses."course_name",
students."id"
FROM "students"
LEFT OUTER JOIN "colleges"
ON students."college_id" = colleges."id"
LEFT OUTER JOIN "courses"
ON (colleges."id" = courses."college_id") AND (courses."course_name" = students."course_name")
WHERE (students."id" = ANY($1)) [[1, 2, 3]]
My problem is that when there is more than one Student belonging to the same College, all of their Courses are preloaded into the %College{} struct for each of those Students. The Student's course_name is not filtering anything out, even though I put it in the join condition.
This makes me think Ecto is not using my join condition to assign Courses to Colleges. It’s simply taking every Course that belongs to a given College and lumping them together, even if my join condition says they should be split by Student.
@joey_the_snake Did select get you the data you need? To be clear, I was saying to replace the preload attribute with select as in:
preload_query =
from(student in Student,
left_join: college in College,
on: student.college_id == college.id,
left_join: course in Course,
on: college.id == course.college_id and student.course_name == course.name,
select: %{
student: student.name,
college: college.name,
course: course.name
}
)
from(
state in State,
preload: [students: ^preload_query]
)
|> Repo.all()
You can choose whatever data you need in select and the query should return State structs with a students array with those custom data structs. Sometimes it is just easier to select what you want rather than relying on preload.