We are having a friendly discussion at work about whether it is better to always preload or to let it crash.
For illumination, consider these 2 variations of function which lives in some Chats module:
Option 1
def is_safe?(chat) do
Enum.all?(chat.messages, &safe_message?/1)
end
Option 2
def is_safe?(chat) do
chat = Repo.preload(chat, :messages)
Enum.all?(chat.messages, &safe_message?/1)
end
Some people prefer option 1 because it avoids a situation where we might be calling Repo.preload without realizing it. They believe the function should simply crash if preloads are missing, putting the onus on the developer to ensure all necessary preloads are present. The stipulation here is that the developer must write tests to ensure actual user flows that call this function are preloading before calling it.
Other people prefer option 2 because it avoids crashing. Crashing LiveViews can result in a crash loop which hammers the database.
Personally I prefer option 1 but I think the choice is quite subjective.
I am simply asking for opinions. I do not expect a “one true way” answer or overall consensus. Scribble your baseless reckonings with complete abandon.
Not a direct answer but to me there is nothing subjective about having side effects in a predicate function. If I’m just asking a question I don’t expect anything to happen, ever.
I’ve done both depending on context but one criticism I’d make of this example option one is that a guard would be more explicit and result in a better error.
Option 1 seems like a bad design for a public function, for a private one is fine. I would expect for functions to be as idempotent as possible.
Also, it’s not uncommon at all to call Repo.preload/3 multiple times, and the behavior is clearly documented:
In case the association was already loaded, preload won’t attempt to reload it. Preload assumes each association has the same nested associations already loaded. If this is not the case, it is possible to lose information. For example:
You’re right, it’s not (really). I just much prefer option 1, I suppose. I’d need more context but generally prefer necessary data to be known upfront. I could also see myself doing this all in the db (again, need more context). I also rarely use conditional pre-loads (ie, context functions either preload or they don’t), although I never considered that Repo.preload is a no op if already loaded.
It’s funny that my immediate reaction was that there is nothing subjective about this. Clearly option 2 is the correct answer.
Now I definitely see your point, but there is a big issue with this approach. It creates hidden coupling. When I’m wondering why a piece of code preloads an association that doesn’t seem to be used, it takes me quite a while to figure out that it is actually used by a function five layers away from the place that does the preload.
As a consequence, it is almost impossible to remove a preload, because who knows if it actually isn’t used any more.
Option 1 seems like a bad design for a public function
This is a good point. Why not pass messages?
I do like al2o3cr’s approach though and I might steal it. I tend to be defensive, like Option 2, but I hate the coupling. It’s very AR-like and you could easily end up littering the codebase with DB calls. Through tests I attempt to make sure the records are already preloaded so it’s always a no-op, but alerts would be handy in the event I missed something.
Ya, this type of thing is something I’ve never been fully comfortable with. The way I work does end up crating some couple between front and backend, but that’s the trade-off I’m most comfortable with as I don’t work on massive apps.
I avoid this by only preloading in two situations: for nested forms and when when the record doesn’t make sense without the association. A simple real-life example I had was Artwork and Artist. There was no situation in the app where I would want an Artwork or list of them without knowing their Artist. On the other hand, if I wanted a list of artworks for an artist, I have a named domain function for this, ie, Artwork.list_by_artist(artsist), even if the implementation is effectively just Repo.preload(artist, :artworks).artworks.
Now that I’m more awake, though, I think this question isn’t really necessarily about preloads which is what threw me. If you are actually concerned about messages not being present, then I’d personally always go to the database and push the logic there, ie, a db query that returns a boolean as to if all messages are safe or not. Otherwise, you could avoid this confusion altogether by changing the function to accept a list of messages, then there is no question what’s needed and it doesn’t matter how you get the messages. But again, would need more context of the actual scenario.
Might just be the fault of a trivial example. When I run into this issue it’s typically a lot more involved than just mapping over a single collection that may not be loaded. For trivial stuff I’d look to the database as someone else mentioned.
But for the given example, yes you would need to preload outside of this function. So 1, but the difference is that the requirement is no longer hidden behind the function sig. From the caller’s perspective, you might find that one of these is more likely to result in a runtime error in practice. I’m more likely to forget about preloading when the required structure isn’t in the function sig.
Dialyzer appears to also check the success typing, so attempting to pass chat.messages without preloading leads to an error and that is resolved when preloading. Though I just put this together real quick so might be missing something.
incompatible types given to Comms.is_safe?/1:
Comms.is_safe?(chat.messages)
given types:
-%Ecto.Association.NotLoaded{__cardinality__: :many, __field__: :messages, __owner__: Chat}
I think this is a great question and we should have more discussion of this kind in Elixir community.
In my opinion the question could be rephrased as: are you ok with this function having a dependency on the Repo, that is a low-level persistence concern? Does it fit this place in you application?
In general, I think Ecto.Repo tends to leak all over the application, because we are passing around Ecto schemas, on which it’s possible to call functions like Repo.preload. I sometimes wish it was not called Ecto.Repo and would make way to have proper repositories in the app that return structs that are completely disconnected from the persistence concerns. But this is just a digression.
I like the heuristic that predicate function should not generally have side effects. Because a database call definitely is a side effects (in terms of functional programming, pure functions etc.), even if not a very “dangerous” one. That being said, there might be case in which having side effects for such functions is ok. I’m just not sure how this is in your case.
Imo, better to crash. It’s a bug and should be fixed asap. Sure you’re hammering the database while going on, but it seems to weird to optimize for the expectation of a bug happening.
In our app, we try to make clear distinctions between data loading and business logic; the latter assumes all the necessary data has been loaded.
That said, it’s nice to have conveniences, especially in a REPL, so sometimes we have functions like this (notice the preloading is opt-in):
def do_something(model, opts \\ []) do
model = if Keyword.get(opts, :preload, false) do
Repo.preload(model, [:bunch, :of, :stuff])
else
model
end
...
end
Our application code shouldn’t ever say preload: true, but it’s ok to say it on a REPL.
Since I haven’t heard anyone speak up in favor of option 2, and on the contrary it sounds like many people simply consider it to be objectively “wrong,” I’ll say that I have made the explicit decision to preload in various deeply nested private functions, which, in sufficiently complex code, would require the root caller to preload 7-8 different associations. In practice this tends to produce a lot of annoying bugs during refactoring when the “dependencies” of the nested functions change and the committer forgets to update the query in the caller or updates it incorrectly. When the inefficiencies don’t cause sufficient slowness to affect the behavior of the system, I consider it to be a perfectly fine trade off. When the time comes to optimize for efficiency, I can pull them up the call chain then.
I default to loading then processing as two functions steps but there are cases where that may not make sense. Like many other things it just depends on context. Even for option 1 could have an explicitly named function that loads a chat with the messages. I’ve worked on multiple projects where we would have something that was used very frequently and pretty much always needed the association so I’d create an explicitly named function that would always load the struct with the specific association.
I am in favor of option 2, even though I didn’t mention it directly, it results from the argument of having idempotent functions.
If we put aside the issue with n+1 queries, which I think can only be assessed by analyzing performance, the option 2 offers a solution that always works, no matter where that function is called from, which as you mentioned is a very important factor when you are doing refactors or introducing new functionality.
I also think that this conversation derailed a little bit beyond the question, as overall architectural designs that led to these 2 options are beyond the discussion at hand, it might be possible that refactoring that part is out of the question at this point in time.
Thanks for all the replies everyone. I really appreciate you all fearlessly sharing your thoughts. Trying to see it from everyone’s perspective is helping me think about this question in an abstract way with much more clarity.
I know there is Ecto.assoc_loaded?/1… Would it make sense to have some standard assoc getter functions (e.g. get_assoc, fetch_assoc, fetch_assoc!) to make it easier to work with assocs? (Not too difficult to write manually, I suppose, but wondering if this idea makes sense to have as a standard feature…)
I am heavily in the option 2 camp. No function should just crash. That means that pre loading ahead of time is only required to prevent n+1 queries.
In fact it can be incredibly beneficial in the repl in prod if functions fetch the data they need on demand, allowing you to get ad-hoc functionality as needed.
Requiring preloads up front is onerous and in a sufficiently deep schema it becomes difficult to know up front which preloads are even used, so you end up wasting a ton if resources querying for data you never use.
The only caveat i would make is that it can be difficult in ecto to find which function actually executed a query when you do this, so I do wish I could annotate queries in ecto at the top level to track down such problems.