Is it "better" to crash when missing preloads or to always ensure preloads assocs?

slouchpie · May 13, 2025, 12:55pm

Greetings comrades

We are having a friendly discussion at work about whether it is better to always preload or to let it crash.

For illumination, consider these 2 variations of function which lives in some Chats module:

Option 1

def is_safe?(chat) do
  Enum.all?(chat.messages, &safe_message?/1)
end

Option 2

def is_safe?(chat) do
  chat = Repo.preload(chat, :messages)
  Enum.all?(chat.messages, &safe_message?/1)
end

Some people prefer option 1 because it avoids a situation where we might be calling Repo.preload without realizing it. They believe the function should simply crash if preloads are missing, putting the onus on the developer to ensure all necessary preloads are present. The stipulation here is that the developer must write tests to ensure actual user flows that call this function are preloading before calling it.

Other people prefer option 2 because it avoids crashing. Crashing LiveViews can result in a crash loop which hammers the database.

Personally I prefer option 1 but I think the choice is quite subjective.

I am simply asking for opinions. I do not expect a “one true way” answer or overall consensus. Scribble your baseless reckonings with complete abandon.

sodapopcan · May 13, 2025, 1:05pm

Not a direct answer but to me there is nothing subjective about having side effects in a predicate function. If I’m just asking a question I don’t expect anything to happen, ever.

tfwright · May 13, 2025, 1:05pm

I’ve done both depending on context but one criticism I’d make of this example option one is that a guard would be more explicit and result in a better error.

D4no0 · May 13, 2025, 1:07pm

Option 1 seems like a bad design for a public function, for a private one is fine. I would expect for functions to be as idempotent as possible.

Also, it’s not uncommon at all to call Repo.preload/3 multiple times, and the behavior is clearly documented:

In case the association was already loaded, preload won’t attempt to reload it. Preload assumes each association has the same nested associations already loaded. If this is not the case, it is possible to lose information. For example:

tfwright · May 13, 2025, 1:11pm

Is a dB query really a side effect? I feel like the latter term can be a bit fuzzy, but I’ve always seen this just as a performance issue…

sodapopcan · May 13, 2025, 1:30pm

You’re right, it’s not (really). I just much prefer option 1, I suppose. I’d need more context but generally prefer necessary data to be known upfront. I could also see myself doing this all in the db (again, need more context). I also rarely use conditional pre-loads (ie, context functions either preload or they don’t), although I never considered that Repo.preload is a no op if already loaded.

al2o3cr · May 13, 2025, 1:40pm

One hazard with option 2 is that if somebody decides to call is_safe? for each chat in a list, you’re right back in N+1 query trouble.

A solution I’ve used in production was to have a helper called ensure_preload that behaves differently in production vs test:

in test, it crashes with a detailed message about what’s missing if the preloaded data isn’t there
in production, it does the requested preload and logs a message (and/or sends a notification to your exception tracker)

That way production is never broken (just slow) but code that’s covered by tests is verified to work efficiently.

Sorc96 · May 13, 2025, 3:01pm

It’s funny that my immediate reaction was that there is nothing subjective about this. Clearly option 2 is the correct answer.

Now I definitely see your point, but there is a big issue with this approach. It creates hidden coupling. When I’m wondering why a piece of code preloads an association that doesn’t seem to be used, it takes me quite a while to figure out that it is actually used by a function five layers away from the place that does the preload.

As a consequence, it is almost impossible to remove a preload, because who knows if it actually isn’t used any more.

codyjroberts · May 13, 2025, 3:13pm

Option 1 seems like a bad design for a public function

This is a good point. Why not pass messages?

I do like al2o3cr’s approach though and I might steal it. I tend to be defensive, like Option 2, but I hate the coupling. It’s very AR-like and you could easily end up littering the codebase with DB calls. Through tests I attempt to make sure the records are already preloaded so it’s always a no-op, but alerts would be handy in the event I missed something.

D4no0 · May 13, 2025, 3:29pm

Edit.

Not entirely sure how that would look like. Care to show an example? You would need to preload those messages anyway, so it would look like option 1?

sodapopcan · May 13, 2025, 3:34pm

Ya, this type of thing is something I’ve never been fully comfortable with. The way I work does end up crating some couple between front and backend, but that’s the trade-off I’m most comfortable with as I don’t work on massive apps.

I avoid this by only preloading in two situations: for nested forms and when when the record doesn’t make sense without the association. A simple real-life example I had was Artwork and Artist. There was no situation in the app where I would want an Artwork or list of them without knowing their Artist. On the other hand, if I wanted a list of artworks for an artist, I have a named domain function for this, ie, Artwork.list_by_artist(artsist), even if the implementation is effectively just Repo.preload(artist, :artworks).artworks.

Now that I’m more awake, though, I think this question isn’t really necessarily about preloads which is what threw me. If you are actually concerned about messages not being present, then I’d personally always go to the database and push the logic there, ie, a db query that returns a boolean as to if all messages are safe or not. Otherwise, you could avoid this confusion altogether by changing the function to accept a list of messages, then there is no question what’s needed and it doesn’t matter how you get the messages. But again, would need more context of the actual scenario.

codyjroberts · May 13, 2025, 7:36pm

Might just be the fault of a trivial example. When I run into this issue it’s typically a lot more involved than just mapping over a single collection that may not be loaded. For trivial stuff I’d look to the database as someone else mentioned.

But for the given example, yes you would need to preload outside of this function. So 1, but the difference is that the requirement is no longer hidden behind the function sig. From the caller’s perspective, you might find that one of these is more likely to result in a runtime error in practice. I’m more likely to forget about preloading when the required structure isn’t in the function sig.

Dialyzer appears to also check the success typing, so attempting to pass chat.messages without preloading leads to an error and that is resolved when preloading. Though I just put this together real quick so might be missing something.

incompatible types given to Comms.is_safe?/1:

    Comms.is_safe?(chat.messages)

given types:

    -%Ecto.Association.NotLoaded{__cardinality__: :many, __field__: :messages, __owner__: Chat}

katafrakt · May 13, 2025, 8:54pm

I think this is a great question and we should have more discussion of this kind in Elixir community.

In my opinion the question could be rephrased as: are you ok with this function having a dependency on the Repo, that is a low-level persistence concern? Does it fit this place in you application?

In general, I think Ecto.Repo tends to leak all over the application, because we are passing around Ecto schemas, on which it’s possible to call functions like Repo.preload. I sometimes wish it was not called Ecto.Repo and would make way to have proper repositories in the app that return structs that are completely disconnected from the persistence concerns. But this is just a digression.

I like the heuristic that predicate function should not generally have side effects. Because a database call definitely is a side effects (in terms of functional programming, pure functions etc.), even if not a very “dangerous” one. That being said, there might be case in which having side effects for such functions is ok. I’m just not sure how this is in your case.

cjbottaro · May 13, 2025, 10:20pm

Imo, better to crash. It’s a bug and should be fixed asap. Sure you’re hammering the database while going on, but it seems to weird to optimize for the expectation of a bug happening.

In our app, we try to make clear distinctions between data loading and business logic; the latter assumes all the necessary data has been loaded.

That said, it’s nice to have conveniences, especially in a REPL, so sometimes we have functions like this (notice the preloading is opt-in):

def do_something(model, opts \\ []) do
  model = if Keyword.get(opts, :preload, false) do
    Repo.preload(model, [:bunch, :of, :stuff])
  else
    model
  end
  ...
end

Our application code shouldn’t ever say preload: true, but it’s ok to say it on a REPL.

tfwright · May 14, 2025, 3:07am

Since I haven’t heard anyone speak up in favor of option 2, and on the contrary it sounds like many people simply consider it to be objectively “wrong,” I’ll say that I have made the explicit decision to preload in various deeply nested private functions, which, in sufficiently complex code, would require the root caller to preload 7-8 different associations. In practice this tends to produce a lot of annoying bugs during refactoring when the “dependencies” of the nested functions change and the committer forgets to update the query in the caller or updates it incorrectly. When the inefficiencies don’t cause sufficient slowness to affect the behavior of the system, I consider it to be a perfectly fine trade off. When the time comes to optimize for efficiency, I can pull them up the call chain then.

paulsullivanjr · May 14, 2025, 11:07am

I default to loading then processing as two functions steps but there are cases where that may not make sense. Like many other things it just depends on context. Even for option 1 could have an explicitly named function that loads a chat with the messages. I’ve worked on multiple projects where we would have something that was used very frequently and pretty much always needed the association so I’d create an explicitly named function that would always load the struct with the specific association.

D4no0 · May 14, 2025, 11:33am

I am in favor of option 2, even though I didn’t mention it directly, it results from the argument of having idempotent functions.

If we put aside the issue with n+1 queries, which I think can only be assessed by analyzing performance, the option 2 offers a solution that always works, no matter where that function is called from, which as you mentioned is a very important factor when you are doing refactors or introducing new functionality.

I also think that this conversation derailed a little bit beyond the question, as overall architectural designs that led to these 2 options are beyond the discussion at hand, it might be possible that refactoring that part is out of the question at this point in time.

slouchpie · May 15, 2025, 2:05am

Thanks for all the replies everyone. I really appreciate you all fearlessly sharing your thoughts. Trying to see it from everyone’s perspective is helping me think about this question in an abstract way with much more clarity.

arcanemachine · May 15, 2025, 7:39am

I know there is Ecto.assoc_loaded?/1… Would it make sense to have some standard assoc getter functions (e.g. get_assoc, fetch_assoc, fetch_assoc!) to make it easier to work with assocs? (Not too difficult to write manually, I suppose, but wondering if this idea makes sense to have as a standard feature…)

mindreader · May 19, 2025, 8:12am

I am heavily in the option 2 camp. No function should just crash. That means that pre loading ahead of time is only required to prevent n+1 queries.

In fact it can be incredibly beneficial in the repl in prod if functions fetch the data they need on demand, allowing you to get ad-hoc functionality as needed.

Requiring preloads up front is onerous and in a sufficiently deep schema it becomes difficult to know up front which preloads are even used, so you end up wasting a ton if resources querying for data you never use.

The only caveat i would make is that it can be difficult in ecto to find which function actually executed a query when you do this, so I do wish I could annotate queries in ecto at the top level to track down such problems.