Ecto transaction rolling back with no exceptions? Weird behavior or not missing docs!?

Hi everyone, it seems I can鈥檛 get out of this forum this week :sweat_smile: (you guys are awesome btw).

So, I got surprised by this behavior while using Ecto and I鈥檓 still banging my head against the wall with some conflicting information.

A coworker opened this issue: Updating a record in a Repo transaction failure 路 Issue #3944 路 elixir-ecto/ecto 路 GitHub yesterday where he describes something unexpected we found with how transactions work.

The context of the whole thing is in the issue, but to summarize鈥 The docs about transaction/2 say:

If an unhandled error occurs the transaction will be rolled back and the error will bubble up from the transaction function. If no error occurred the transaction will be committed when the function returns. A transaction can be explicitly rolled back by calling rollback/1, this will immediately leave the function and return the value given to rollback as {:error, value}

This is what we are trying to do鈥 We always have a 鈥渢ag鈥 persisted after trying to save a 鈥減ost鈥. If we are unable to persist the 鈥渢ag鈥, we also don鈥檛 care about saving the 鈥減ost鈥:

Repo.transaction(fn repo ->
  case repo.insert(changeset) do
    {:ok, post} ->
     # Great! Please also save the tag confirming it
     # If we can't just explode and rollback...
      tag
      |> Ecto.Changeset.change(%{name: "success"})
      |> repo.update!()
    {:error, changeset} ->
     # Oh noes! Please save some information about it
     # If we can't save just explode and rollback...      
      tag
      |> Ecto.Changeset.change(%{name: "failure"})
      |> repo.update!()
    end
end)

What surprised us that is that this operation fails, even though we are handling the errors in the first insert. This is the exception raised when the code gets to the repo.update! (this is from the adapter, so it also happens with the safe alternative repo.update):

** (Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction) current transaction is aborted, commands ignored until end of transaction block

Ok, apparently there鈥檚 something preventing the transaction from succeeding. After conducting some tests it seems that repo.insert is putting the transaction in a bad state and repo.update! can鈥檛 succeed because of it. We had that confirmed by jose and as it seems, this error is happening because an operation already failed inside the transaction. He also seems to agree that the docs are talking about exceptions and not just 鈥渆rrors鈥. So, what exactly are we missing here?

If the docs are right about the transaction rolling back on exceptions, why can鈥檛 this code succeed? If not, and we agree that there鈥檚 something to be improved in the docs, what should the correct behavior be?

I鈥檓 certain that there鈥檚 some important information I鈥檓 missing, but I鈥檓 feeling a bit misguided by the docs and we want to improve it for posterity.

I think this is normal database behaviour. Whatever database you are using shouldn鈥檛 let you continue with the same transaction if a previous command caused an error. This is an error from the database not from Ecto:

** (Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction)

Yep, we found that out ourselves afterward, but still unexpected considering the docs description. I guess it鈥檚 perhaps one of those 鈥淓cto is not your ORM鈥 moments :blush:.

Still, there are some improvements to be made to the docs and we were already feeling kinda stressed out by apparently contradictory information. I鈥檓 honestly kinda drained at the point of having things spelled out to me.


BTW, just to vent a little bit: It鈥檚 not immediately obvious to more experienced people that sometimes questions like this one are a symptom of some sort of cognitive dissonance or lack of broader understanding about a subject. So answering those questions too objectively, without taking into account the reason for such misalignment of information might worsen the experience even more and in the end, make people more stressed overall. I know this is a technical community, but even so, we are still only people :slightly_smiling_face:.

What I guess it鈥檚 going on is that the first insert failed because it was violating some DB constraints, but this DB error was caught by your changeset (perhaps using unique_constraint/3 or similar) and converted into a changeset error. So the exception has been handled in the code, but the DB transaction is still tainted by the error and unable to continue (this is what the DB error is telling you).

I guess the documentation may be technically correct in that it says nothing about what happens when an error is handled :slight_smile: That said, it would make sense to describe this possible situation.

All of this of course assuming that my above assumption is correct.

Hey, @trisolaran this was intuitively what we thought at first, but look at this repro over here: ecto-transaction-flow-sample/script.exs at master 路 rafaeliga/ecto-transaction-flow-sample 路 GitHub. There is a constraint check in the changeset but it doesn鈥檛 seem to be enough to prevent this from happening.

Technically correct can be the best kind of correct :sweat_smile:, but unfortunately not enough to help our thick skulls I鈥檓 afraid!? Jokes aside, We will try to help with that if we can so no one else goes through the same experience ever again.

Yeah that鈥檚 what I meant. unique_constraint doesn鈥檛 prevent the DB error, it works by waiting for the DB error to happen on insert, catching it and putting it into the changeset. What you鈥檙e seeing is expected. When you鈥檙e calling insert here you鈥檙e triggering a DB error. unique_constraint is simply catching the error for you and turning it into a nice human friendly error for your changeset, but the DB transaction is still in an error state.

2 Likes

Although a little unintuitive at first; considering that Ecto doesn鈥檛 do much for those cases it makes sense that it behaves like that. Thanks for the clarifications @trisolaran :fist_right::fist_left:.

So as it seems, it simply isn鈥檛 possible to wrap two operations in the same transaction if one of them relies on checking any kind of constraints.

That question that remains is if there鈥檚 another way to solve this without having to manually check for the existence of the record in the then to prevent tainting the transaction. Any particular suggestions!?

You could do an insert with on_conflict: :nothing. This will not cause an error.

1 Like

Wow!!! :clap::clap::clap:

I have never thought about using the :on_conflict option with intent other than for upserts but this makes complete sense :see_no_evil:. Excellent, excellent observation @joey_the_snake!!! :fist_right::boom::fist_left:

For whatever it鈥檚 worth, this issue has been confounding people for a long time (note the date!):

2 Likes

I鈥檓 afraid that won鈥檛 help. on_conflict: :nothing will prevent Ecto from raising, but the DB doesn鈥檛 care about it: the conflict violation will still have occurred, your post will not be inserted and the transaction will be in an error state. Plus you will not get any feedback on whether you were able to successfully insert your post or not, and you want to know that, right?

My suggestion: rollback your transaction at the first error and handle error reporting and logging outside of it.

Why do you think this? ON CONFLICT DO NOTHING is a database command that avoids raising the error. I have used it in production. PostgreSQL: Documentation: 14: INSERT

The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error

You鈥檙e right, thanks for the clarification. Sure, no error will be raised, but the DB won鈥檛 insert anything in the DB yet insert will return {:ok, post} as if something was inserted.

Whether or not that鈥檚 fine is up to the developer though, no?

I personally wouldn鈥檛 want that in my code and I would find it confusing if I found it in other codebases. In any case, in this specific example, it鈥檚 an issue because @thiagomajesk wans to distinguish between {:ok, post} and {:error, changeset} and be able to take different actions. My point was that he can simplify things and avoid this entire issue by moving his error management code outside of the transaction

That鈥檚 right @trisolaran! Just noticed this behavior testing today鈥 Here鈥檚 the log if someone else is curious about it (from the repro I linked in the issue):

10:05:40.814 [debug] QUERY OK db=2.3ms queue=0.3ms
INSERT INTO "tags" VALUES (DEFAULT) RETURNING "id" []

10:05:40.819 [debug] QUERY OK db=2.4ms queue=0.2ms
INSERT INTO "posts" ("title") VALUES ($1) RETURNING "id" ["123"]

10:05:40.819 [debug] QUERY OK db=0.1ms
begin []

10:05:40.819 [debug] QUERY OK db=0.1ms
INSERT INTO "posts" ("title") VALUES ($1) ON CONFLICT DO NOTHING RETURNING "id" ["123"]
POST: %Post{__meta__: #Ecto.Schema.Metadata<:loaded, "posts">, id: nil, title: "123"}

10:05:40.822 [debug] QUERY OK db=0.1ms
commit []

I can see that the error was suppressed correctly but the 鈥減ost鈥 not persisted (id == nil).
I see how this is unintuitive that we are returning :ok for a failed operation and it wouldn鈥檛 be my first choice either. Since this is a corner case, perhaps we could solve it by pattern-matching and just forget about it:

Repo.transaction(fn repo ->
  case repo.insert(changeset) do
    {:ok, %{id: nil}} ->
       # failed first, we don't care anymore
       Repo.rollback(:post)
    {:ok, post} ->    
      tag
      |> Ecto.Changeset.change(%{name: "success"})
      |> repo.update!()
    {:error, changeset} -> 
      tag
      |> Ecto.Changeset.change(%{name: "failure"})
      |> repo.update!()
    end
end)

The real problem with this approach is that we don鈥檛 have any information whatsoever about why this has failed and in this specific scenario with no information about the constraint to report to users.

How would you do that if both operations are dependent on each other? I mean, they somehow have to be inside the same transaction to guarantee that we save at least one of them, right!?

BTW I see someone else suggesting the usage of savepoints, but I haven鈥檛 used those in Ecto before. I鈥檓 not sure how would the fit in this scenario.

I wonder why you absolutely need to work with these 鈥渆rror鈥 and 鈥渇ailure鈥 tags in the DB and can鈥檛 instead log the error or show it to the user?

However, putting this aside, how about this:

  • write a 鈥渇ailure鈥 tag before starting the transaction

Start the transaction:

  • insert the post, if insertion fails, roll back transaction
  • update the 鈥渇ailure鈥 tag to be a 鈥渟uccess鈥 tag
  • commit

This way, if anything goes wrong with insertion, you鈥檒l have a failure tag.
Wouldn鈥檛 this work?

Yes, this is absolutely required!

This flow would work, but I鈥檇 have to clean all 鈥渢ags鈥 not referencing 鈥減osts鈥 in the DB at the end, which is an operation that could fail on its own and leave us with loose 鈥渢ags鈥. Remember that the 鈥渢ag鈥 in this case, can only exist on two occasions: if a 鈥減ost鈥 is persisted and if a 鈥減ost鈥 has errors. It鈥檚 almost like a logging mechanism, if I create the log before and there鈥檚 nothing worth logging I have to remove it somehow.

BTW, It seems someone else already had this exact same problem: How to not rollback transaction on failed insert because of unique constraint - #8 by benwilson512. I conducted some tests using one of the post鈥檚 suggestions and the savepoint feature seems to work without much effort. The only difference is that no additional match on the case was required:

Repo.transaction(fn repo ->   
  case repo.insert(changeset, mode: :savepoint) do
    {:ok, post} ->
      IO.inspect(post, label: "POST")

      tag
      |> Ecto.Changeset.change(%{name: "success"})
      |> repo.update!()

      {:error, changeset} ->
        IO.inspect(changeset, label: "CHANGESET")

        tag
        |> Ecto.Changeset.change(%{name: "failed"})
        |> repo.update!()
    end
end)

This returns the tuple with the changeset containing the constraint errors and commits the transaction as expected:

12:31:46.616 [debug] QUERY OK db=1.6ms queue=0.2ms
INSERT INTO "tags" VALUES (DEFAULT) RETURNING "id" []

12:31:46.620 [debug] QUERY OK db=1.5ms queue=0.1ms
INSERT INTO "posts" ("title") VALUES ($1) RETURNING "id" ["123"]

12:31:46.620 [debug] QUERY OK db=0.1ms
begin []

12:31:46.622 [debug] QUERY ERROR db=1.5ms
INSERT INTO "posts" ("title") VALUES ($1) RETURNING "id" ["123"]
CHANGESET: #Ecto.Changeset<
  action: :insert,
  changes: %{title: "123"},
  errors: [
    title: {"has already been taken",
     [constraint: :unique, constraint_name: "posts_title_index"]}
  ],
  data: #Post<>,
  valid?: false
>

12:31:46.624 [debug] QUERY OK db=0.2ms
UPDATE "tags" SET "name" = $1 WHERE "id" = $2 ["failed", 1]

12:31:46.626 [debug] QUERY OK db=1.8ms
commit []

What happens if an exception is thrown when using mode: :savepoint? For instance could this happen:
1. insert succeeds
2. update fails
3. transaction goes to savepoint directly before the update
4. transaction completes successfully

edit: i didn鈥檛 notice the savepoint is per statement and not on the transaction. pls ignore.