How do I preload associations inside a changeset?

ecto

#1

TL;DR

How do we add preloaded associations to a changeset and reuse them in several functions that need them in order to accumulate further changes in the changeset? All before persisting to the DB?

Long version:

Lately I am reworking a module with the following goals in mind:

  1. Being able to accumulate arbitrary amount of changes before even a single database operation is made.
  2. Being able to add associations which are dependent on other associations, all inside a changeset (strongly related to point 1). Using Changeset.get_field would return the association-not-loaded value.

The second point is becoming a major pain. I have a load of chain-able functions with specs like these:

@spec order_add_something(%Changeset{}, stuff_to_add) :: %Changeset{}

And I want to keep them that way because I do stuff like this and I love it:

Order.new(mandatory_fields)
|> Order.add_billing_address(...)
|> Order.add_shipping_address(...)
|> Order.add_line_item(...)
|> Repo.insert!

PROBLEM: Adding a line item requires a preloaded association inside the order (in this case, client). It’s extremely easy to just do order = changeset.data |> Repo.preload(:client) of course, but I have several separate functions that need that association; must I really preload it in each one of them and always use the enriched order object as a local variable only?

Can I actually preload an association and update the underlying changeset to contain an order + its preloaded association(s) without doing %Changeset{changeset | data: order_with_preloads}? Or is this actually advised? I always assume code like this is unsafe so I need your advice.

And finally, a quick research gave me these:


#2

It looks like the fields of the Changeset struct are documented API, so I don’t think there is any problem with just using: %Changeset{changeset | data: order_with_preloads}


#3

I am not sure about that. The valid? field is documented as well but you definitely shouldn’t be touching it. I’ll still try though, thanks for the tip.


#4

Another approach would be to require the association to be preloaded before the changeset is built. This way it would be the caller’s responsability to preload the required association before calling the changeset functions.

The upside is that the changeset functions would be kept simple and wouldn’t need to perform any calls to the database. The downside is that you must remember to preload the associations before calling the changeset functions, but it could be somewhat helped with a combination of documentation and meaningful error messages explaining which is the missing association.


#5

This is indeed the conventional solution. The preloading, changeset call, and subsequent database call can all get wrapped within a function inside a context module too for easy re-use.


#6

Stupid questions is the example here the appropriate convention?


#7

Not a stupid question at all. I was just wondering at which point I have to do this. :slight_smile:


#8

@belaustegui & @benwilson512 Thank you guys. Shall I gather from your comments that replacing the data field of the Changeset after it has been created is a really bad idea? I think so and I am looking for a confirmation.

I have several functions that preload various associations as the order changeset is going through different states (constructing with a few fields, adding shipping and billing addresses, adding line items, setting various delivery fields, sending transactional emails and storing their sending timestamp etc.) and I deliberately chose not to use Ecto.assoc_loaded? because I am afraid of stale data – some of the orders are carts and might live for hours or days, for example.

Preloading the same association several times in short-lived orders – like in tests – feels like a huge anti-pattern though. What would you recommend to me? I can’t use Repo.transaction – there are several separate functions that operate on the order changeset. Maybe Ecto.Multi?

Apologies for requesting some hand-holding. I am looking for the good practices in the area.

EDIT #1: /CC @michalmuskala
EDIT #2: Nevermind the Repo.transaction comment. A function that creates an Ecto.Multi can be directly fed to Repo.transaction. Sorry, was a little clueless statement from my side.


#9

So my first choice would be exactly what @benwilson512 said - try to load all the data before even entering the changeset. I really like to have much changeset functions pure (without db access or other things) - this makes it quite easy and very fast to test them.

If loading it before is not an option for some reason, than I think updating data is fine, I’d probably go for something like:

update_in(changeset.data, &Repo.preload(&1, :foo))

This should be safe in a way that the data field of a changeset is considered public and fine to update, unless you don’t change the data type (since we cache some things like types inside the changeset struct).


#10

@michalmuskala What I am doing right now is: my_changeset = %Changeset{my_changeset | data: order_with_preloads}. I mostly did it because of Elixir’s documented way of merging changes into structs (using | with the current struct and then the changes you want is supposedly faster on a lower level).

Can you think of an argument against that? Granted update_in looks better, I’ll give it that!

Preloads can’t be predicted in my scenario and I don’t wish to just eagerly preload everything. The order goes through many functions I called “steps”, and we have different kinds of orders for which some of the steps are never called – so the preloads are superfluous.

As one of the Ecto core team members, are you telling me that replacing the data field – with the same type of struct but with preloads – of a changeset is safe?


#11

I would use this technique cautiously. If you have nested associations (2-3+ levels deep), constructing a big query with many joins could be slower than firing a few small, individual queries. I suspect Elixir’s/Erlang’s concurrency plays a role here :slight_smile: