I think my preloads are 'looping' over each other, how do I manage them correctly?

In my current project my database requires a few associations between tables that ultimately end up belonging to a record in a ‘customers’ table. I’m finding it a real struggle to maintain the series of nested pre-loads my current understanding says I need and in some cases, finding they are creating loops that can crash the server. The table structure is like this:

Customers
  x Items
    - provider
    - type
    - status
    - customer
    x action
    x note
  x Events
    - source
    - status
    - grade
    - specialty
    - destination
    - customer
    x decisions
    x items
  x Decisions
    x note
    - customer

where - means it belongs_to and x is a has_many.

If I make a call for a customer I have to preload all of the second level tables (Items, Events and Decisions) and all of their sub tables or I get errors about associations not being loaded. As you can see, sometimes this includes pre-loading the customer within one of those pre-load sub queries which just doesn’t feel right at all.

I can choose not to return this extra customer in the JSON by having a different template in the View but if I don’t preload it, I get an error.

This is also a problem because some of these second level tables might have relations to each other. For example, a Decision has a has_many relationship with Event. This looks like it creates an even bigger pre-loading loop and causes my docker container to crash out.

I’m clearly doing something wrong and I’m not sure if it’s in my understanding of preloading (which is poor but the documentation I find very hard to understand), my underlying table strucure (which I understand would be out of the scope of this forum) or something else. It doesn’t look like you can choose NOT to preload an item so i’m a bit stuck.

Note: These are typically currently called without using join syntax as I found it difficult to understand with so many nested preloads.

It would help to see your code, the error you’re having or why your Docker container might be crashing. All you describe is in theory possible, and I’m pretty sure Ecto avoid circular preloads (by keeping a record of all the IDs it has to load from a specific table).

sometimes this includes pre-loading the customer within one of those pre-load sub queries which just doesn’t feel right at all

You don’t have to preload. You preload if you’re 80% sure you’ll use the information you have preloaded together with the original piece of data. Otherwise it might be worth just loading what’s needed where, especially with a database tree this thick.

Additionally, your database structure is… peculiar, if I understand your comment correctly:

Customers
  x Items
    [snip]
    - customer
    [snip]
  x Events
    [snip]
    - customer
    [snip]
  x Decisions
    [snip]
    - customer

How does a customer have many items/events/decisions that have one (other) customer? Either the graph is incorrect, or the database structure needs some rethinking.

EDIT: This might be a X-Y problem. What are you trying to achieve in the first place? What kind of query are you trying to express?

1 Like

So at the moment a Customer record has a has_many relationship with Items.

Items also has a belongs_to relationship with a Customer record.

This is because sometimes I may call on the Customer in its entirety and sometimes I may just want to take the Item through its own REST methods but with the Customer attached. I think this is where I am principally causing the problems.

You don’t have to preload. You preload if you’re 80% sure you’ll use the information you have preloaded together with the original piece of data. Otherwise it might be worth just loading what’s needed where, especially with a database tree this thick

If I’m calling a Customer record like this:

     Repo.one(
       from(c in Customer,
         where: p.id == ^an_id,
         preload: [^preloads]
       )
     )

Then ^preloads always seems to require the full tree or I generate errors about associations not being loaded, so in this case it would be something like:

      items: [:customer, :provider, :type, :status, :action, :note],
      events: [
        :source,
        :status,
        :grade,
        :specialty,
        :destination,
        :customer,
        decision: :note,
        tasks: :task
      ],
      decisions: [:customer, :note]

Then in the view I might choose to include a render_one or render_many depending on if I want to use the value - which even as I type sounds stupid because I’ve already done all the work to preload them!

I think what I need to do is avoid having that :customer preload in each second level table when I am trying to retrieve a customer record. It was my (probably faulty) assumption though that the presence of the belongs_to in the scheme for Customers necessitates that preload.

…although thinking about it as I’m typing - that error is probably caused by the render_many or render_one in the View rather than ecto running the query.

Items also has a belongs_to relationship with a Customer record

If you have a Customer and are preloading its items, you don’t have to preload the customer back, as the relationship is bidirectional: a customer has many items, an item has one customer.

Your preload tree, if I understand your database structure correctly should be

      items: [:provider, :type, :status, :action, :note],
      events: [
        :source,
        :status,
        :grade,
        :specialty,
        :destination,
        decision: :note,
        tasks: :task
      ],
      decisions: [:note]

But this still need to be simplified, as it’s loading from at least 19 tables and a possible unbounded amount of rows. Your Docker container might be crashing simply because it’s running out of memory.

1 Like

I think you’re right and rubber ducking it here has brought to light a few things I need to tighten up further down the line.

A lot of those sub tables really exist to populate drop down menus in such a way that they can be updated without having to update the front end application. This currently reports back to the API as saving a (for example) provider_id as part of a new/updated items record.

Maybe it would be better to just bake those values into the items table. For example adding a provider_name field instead and removing the need to pre-load at all.

Thanks for the help and cutting through my garbled explanations. You’ve been super helpful.

1 Like