Dealing with Ecto.Multi.insert_or_update/3 and :loaded structs

Hello guys, I stumbled upon this problem today and I was seeking some help…

I have this routine where I need to insert or update a bunch of records in the database in a transaction.
Right now, I’m using Ecto.Multi with multiple steps for each operation.
Just as an example, the structure goes something like this:

  • I have categories, posts and tags
  • Each tag can have a color (red, blue, green) and a unique name
  • Each post can have many tags (post_tags)
  • Each category also can have many tags (category_tags)
  • And finally, a post belongs to a category

Every time I add a new tag to a category, I can choose a name and a color. Then, this recently created tag will be propagated to every post inside the category. Now, tow things can happen:

  1. When a post does not have a tag with that name, the tag is created
  2. When a post already has a tag with that name, the tag is updated/overridden.

Pretty simple, right? So what I’m already doing and why it’s not working:

When adding a tag to a category, post_tags may not exist yet but posts already do (because a post can only be created belonging to a category).
Because of that, when I’m querying the post_tags that needs to be inserted or updated, I do a right_join, which will return a list of PostTag that exists with that name and need to be updated or (because of the right join), a list of PostTag structs with empty (nil) values that I can patch and then insert (for that I use select_merge)

Repo.all(from pt in PostTag, 
    right_join: p in Posts, 
    on: pt.post_id == p.id, 
    select_merge: %{color: ...})

Thing is, this query returns a list of PostTag with #Ecto.Schema.Metadata<:loaded>, and an struct that’s already loaded returns a changeset with #Ecto.Changeset<action: :update> instead of #Ecto.Changeset<action: :insert>. Because of that, when I call Ecto.Multi.insert_or_update/3, I get an Ecto.NoPrimaryKeyValueError message.
Since insert_or_update/2 uses the :state field from Ecto.Schema.Metadata to check if it should insert or update the struct, I can’t find a way to work around that.

Of course, the example I gave is overly simplistic, but the whole reason I’m trying to do this in one go is that otherwise, I’d have to hit the database at least a couple of times more to do the inserts and updates that I need and since this table will have a lot of records, being economic would help a lot.

Just as a note:
I was expecting that running a right_join would return a list of PostTag without the loaded state.
Also, since the ids are nil and the docs say that :loaded represents persisted data, it’s a bit misleading…

The state of the schema is stored in the :state field and allows following values:

  • :built - the struct was constructed in memory and is not persisted to database yet;
  • :loaded - the struct was loaded from database and represents persisted data;
  • :deleted - the struct was deleted and no longer represents persisted data.

The same will happen if I query the database and select a completely different struct, let’s say:

iex> Repo.all(from p in Post, select: %Blog {})
[debug] QUERY OK source="blogs" db=0.0ms idle=94.0ms
SELECT TRUE FROM "blogs" AS p0 []
[
    %Blog{
        __meta__: #Ecto.Schema.Metadata<:loaded, "blogs">, 
        id: nil,
        name: nil,
        url: nil 
    }
] 

I guess the part about „… and represents persisted data“ is not quite correct, but those post tags are loaded from the db and I‘m not sure there‘s even a way for ecto to determine that you‘re selecting missing things from the db instead of existing ones.

Exactly, It may be related to the whole “Ecto is not an ORM” thing, but the docs should have a reasonable explanation about this behavior.
.
The same behavior happens with Ecto.load/2, it just “loads” data into the struct, it doesn’t care if the struct is actually persisted which for me at least, is very counter-intuitive. Seems a combination of Kernel.struct/2 that will also change the __meta__ attribute to :loaded

Since frameworks implement the concepts of “attach” and “detach” an object from the database, I guess one might expect the same behavior at first.

In the strict sense that the database operation was run and returned, yes, but the terminology is confusing because those “post tags” are not actually in the database to be loaded loaded.
Does my confusion make sense to you?

I don’t get what benefits do you get out of this

Also are you trying to have something like in the forum where you can add or create tag if it doesn’t exist?
If you are then you should look at Constraints and Upserts — Ecto v3.11.1

The real use case is a bit more complex than this one. But the way it’s supposed to work is that by having “category_tags”, it helps to create a sense of hierarchy when you could have millions of “posts_tags” and you want to:

  1. Edit a bunch of “post_tags” without interacting with it directly
  2. Keep consistency with inherited “tags” for “posts” in the same “category”
  3. When you can have optional inherited “tags” configurations
  4. Configure and propagate settings in one place

I guess this would probably work, but it’s unfortunate that my previous approach doesn’t by default. This pattern of selecting data into a “dto” and then being able to update or insert another entity is very handy.

1 Like