Ecto preload conventions

marick · January 24, 2020, 8:53pm

Are there coding/naming conventions that make it more likely that when Ecto-using code is given an Animal object, it’s an Animal object with all the right associations preloaded?

My inclination is just to always preload all the Animal's fields. (In my case, I don’t yet have trees with more than two meaningful levels.) I’m curious what other people do.

NobbZ · January 24, 2020, 9:01pm

The convention is to just preload what you actually need.

dimitarvp · January 24, 2020, 9:09pm

I don’t think there could be any conventions. Any additional loads from the DB add to your response time so they should be minimised. Unless I am misunderstanding your question.

xpg · January 24, 2020, 10:03pm

That’s a really interesting question. I don’t have an answer for you, but I can share my situation with you. I am working with a code-base that has around 50 database tables, many of them related to a few central tables.
That has lead to a situation where I have a few models that have large number of associations (even some join_through associations).

To begin with, I tried to use the lazy preload approach, i.e. only preload what I need. It quickly turned out that there were preloads that I always used, so I ended up with a “default_preloads” function for each model. Next, I identified that in many places I preloaded the same associations recursively (i.e. [assoc_A: [sub_assoc_A: [:field, :field]]]), which lead to yet a number of preload functions.
This has become somewhat of a mess, where its difficult to know if preloading for a given associations has been performed or not. The result is that almost each function working on the data starts by preloading what it needs, and often preloads more than it needs. Also, once you start preloading an association, you need to consider how deep do you do the preloading.
When introducing new developers to the code base, they have a difficult time figuring out what is supposed to be preloaded when. That is not a good situation.

The approach I am trying to take now is to identify which associations can really be considered as a required part of the model, and are so often needed that they may as well be loaded from the beginning.
The remaining associations I consider removing from the model, and load using explicit loading functions, and pass along on the side of the model struct. This way it becomes much more clear what data a function needs, and I can avoid having pure-functional functions suddenly needing to perform database reads in order fetch the required data.

Imagine having a blog system, where it is possible to add comments. The comments for each blog post do not really need to be loaded as part of the blog post itself. Of course having them as an association makes it easy to load them when needed. However, once the blog post has 10 more associations it becomes tricky to figure out what is preloaded and what should be preloaded.

I’m not sure that the approach I am trying to take is the right one, but I can see that as our code base (and database structure) has evolved over time, things become difficult to manage.

axelson · January 24, 2020, 10:49pm

I think this makes a lot of sense. It sounds similar to the approach of Domain Driven Design where you define an “Aggregate” which is:

[[Aggregate]] A cluster of associated objects that are treated as a unit for the purpose of data changes. External references are restricted to one member of the AGGREGATE, designated as the root. A set of consistency rules applies within the AGGREGATE’S boundaries.

So for each “Aggregate” that you identify you’d always have the same preloaded data. Yes it won’t always be the absolute most performant (since you might preload data you don’t need), but the maintainability will be much higher.

This is an approach I would like to move to, but since I’m using Absinthe/GraphQL/Dataloader so heavily I am generally not using preloads.

baldwindavid · January 25, 2020, 1:59am

I agree that preloading only what is needed is ideal. Trying to guess what preloads will be needed in different contexts is pretty tough. You can write a different function for every specific case, but potentially end up with a muddled context in doing so.

@marick This seems very related to a discussion we’ve been having about making various queries from the controller… TokenOperator - Dependency-free helper most commonly used for making clean keyword APIs to Phoenix context functions

I had the same questions and issues and ended up writing a plugin that allows for a consistent pattern to deal with it. The documentation of the plugin + the discussion in that thread might be helpful because there is another plugin (written by @mathieuprog) and another pattern I am experimenting with that might also be a fit for your needs.