Nested association, conditional preload

joey_the_snake · July 13, 2020, 3:55pm

I have 4 schemas: A, B, C, D. These are nested within each other in the following way:

A has many B
B has one C
C has many D

For each A, I want the following preload conditions:

preload all B's belonging to A
preload the C that belongs to each B
preload only one of the D's that belong to each C, and have this filter based on the value of a field in B

My data is very large and I’d like to do this in separate preload queries as opposed to a single join. My main difficulty is referencing a value from B inside the preload for D. Is there a way to do this?

baldwindavid · July 13, 2020, 4:53pm

You can reference B inside the preload from D using a preload query but, short of denormalizing your data, I’m not sure how it could be done without a join between at least B, C, and D.

joey_the_snake · July 13, 2020, 4:56pm

Thanks, I think you’re right that I have to join B, C and D and then preload that into A.

joey_the_snake · July 14, 2020, 7:42pm

@baldwindavid I’m running into some unexpected behaviour. I wonder if you can spot something I’m doing wrong.

Here is my query:

preload_query =
  from(b in B,
    left_join: c in C,
    on: b.c_id == c.id,
    left_join: d in D,
    on: c.id == d.c_id and b.special_col == d.special_col,
    preload: [c: {c, d: d}]
  )

from(a in A, preload: [b: ^preload_query])

If there is only one b in the preload query then everything works normally, i.e. I only get the d where b.special_col = d.special_col. I’d get something like this:

%B{
  special_col: 'a', 
  c_id: 1, 
  c: %C{
    id: 1, 
    d: [
      %D{c_id:1, special_col: 'a'}
    ]
  }
}

However, if there are multiple b's in the preload query, then an aggregated list of all the d's is populated for each b instead of splitting them up. It looks like it essentially ignores b.special_col = d.special_col in this case. I’d get something like this:

%B{
  special_col: 'a', 
  c_id: 1, 
  c: %C{
    id: 1, 
    d: [
      %D{c_id:1, special_col: 'a'}, 
      %D{c_id:1, special_col: 'b'}, 
      %D{c_id:1, special_col: 'c'}
    ]
  }
}

where the extra d entries are the ones that belong to the other b's that were in the preload query.

baldwindavid · July 14, 2020, 7:46pm

One thing that jumps out at me is the single = for special_col.

joey_the_snake · July 14, 2020, 7:46pm

Sorry that was a typo. I also gave more examples that hopefully make the situation clearer.

baldwindavid · July 14, 2020, 9:04pm

One thing to note is that your query suggests that C has one B rather than the other way around. If C belongs to B it should have the foreign key.

joey_the_snake · July 14, 2020, 10:35pm

Thanks. I made a mistake in my original post. It should be:

A has many B
B belongs to C
C has many D

Bumppoman · July 15, 2020, 1:18am

Can you post an example of the SQL the code produces? That can often pinpoint where the Ecto DSL is going wrong.

joey_the_snake · July 15, 2020, 1:59am

Thanks.

Here is the preload query’s sql. When I put this into postgres manually, the output looks good to me.

SELECT 
  b0."id", b0."c_id", b0."special_col",
  c1."id", 
  d2."id", d2."c_id", d2."special_col",   
  b0."id" 
FROM "B" AS b0 
LEFT OUTER JOIN "C" AS c1 ON b0."c_id" = c1."id" 
LEFT OUTER JOIN "D" AS d2 ON (c1."id" = d2."c_id") AND (d2."special_col" = b0."special_col") 
WHERE (b0."id" = ANY($1)) [[1, 2, 3]]

There is one strange thing, b0."id" is selected twice but I don’t know if this is related to my problem. That’s the field being used in the association with table A.

It looks like my issue is happening when Ecto sorts the associations after getting the query result from the db. I’m not sure if there’s anything I can do to modify this behaviour.

baldwindavid · July 15, 2020, 2:07am

Are your actual tables and schemas named A, B, C, D? My brain is having a hard time making sense of these abstract names.

joey_the_snake · July 15, 2020, 3:41am

How about this:

State has many Students
Student belongs to College
College has many Courses

Let’s say hypothetically that each Student can only enroll in one Course

Now my Ecto query is:

preload_query =
  from(student in Student,
    left_join: college in College,
    on: student.college_id == college.id,
    left_join: course in Course,
    on: college.id == course.college_id and student.course_name == course.name,
    preload: [colleges: {college, courses: course}]
  )

from(state in State, preload: [students: ^preload_query])

SQL for preload query:

SELECT 
  students."id", students."college_id", students."course_name",
  colleges."id", 
  courses."id", courses."college_id", courses."course_name",   
  students."id" 
FROM "students"
LEFT OUTER JOIN "colleges" 
  ON students."college_id" = colleges."id" 
LEFT OUTER JOIN "courses" 
  ON (colleges."id" = courses."college_id") AND (courses."course_name" = students."course_name") 
WHERE (students."id" = ANY($1)) [[1, 2, 3]]

My problem is that when there is more than one Student belonging to the same College, all of their Courses are preloaded into the %College{} struct for each of those Students. The Student's course_name is not filtering anything out, even though I put it in the join condition.

This makes me think Ecto is not using my join condition to assign Courses to Colleges. It’s simply taking every Course that belongs to a given College and lumping them together, even if my join condition says they should be split by Student.

baldwindavid · July 15, 2020, 3:40pm

There is probably a simple answer to why that is happening, but I’m not sure what/why Ecto is doing it. Maybe someone here can explain that.

Regardless, I suspect you might be able to replace preload: [college: {college, courses: course}] with a select option from the returned dataset like…

select: %{
  name: student.name,
  college: college.name,
  course: course.name
}

baldwindavid · July 16, 2020, 7:04pm

@joey_the_snake Did select get you the data you need? To be clear, I was saying to replace the preload attribute with select as in:

preload_query =
  from(student in Student,
    left_join: college in College,
    on: student.college_id == college.id,
    left_join: course in Course,
    on: college.id == course.college_id and student.course_name == course.name,
    select: %{
      student: student.name,
      college: college.name,
      course: course.name
    }
  )

from(
  state in State,
  preload: [students: ^preload_query]
)
|> Repo.all()

You can choose whatever data you need in select and the query should return State structs with a students array with those custom data structs. Sometimes it is just easier to select what you want rather than relying on preload.

joey_the_snake · July 16, 2020, 8:31pm

Sorry for the late reply, I didn’t get a chance to try it out until now.

This new way you mentioned seems to work perfectly. I can really manipulate the result however I want.

Thanks for your patience and help. Have a good one!