DB(dis)Connection - ERROR: could not serialize access due to concurrent update

nhpip · September 18, 2022, 4:22pm

Hi,

We are using DBConnection and Ecto with a Postgres DB. Occasionally we get errors like below in the Postgres log:

2022-09-17 12:36:59.478 EDT [92184] ERROR:  could not serialize access due to concurrent update
2022-09-17 12:36:59.478 EDT [92184] CONTEXT:  SQL statement "update collections
	        set scoped_seq = scoped_seq + 1
	      where id = collection_id
	  returning scoped_seq"
	PL/pgSQL function advance_seq(uuid) line 5 at SQL statement
	SQL statement "select advance_seq(new.collection_id)"
	PL/pgSQL function set_rev_id() line 3 at SQL statement

90% of the time this results in a Postgrex.Error exception with a code of :serialization_failure which permits us to retry the transaction.

However, in a small percentage of cases we get the same error in the Postgres log, but instead a DBConnection.ConnectionError exception with a message of:

12:17:13.217 [error] Postgrex.Protocol (#PID<0.2447.0>) disconnected: ** (DBConnection.ConnectionError) transaction rolling back

My questions are:

Why do we sometimes get a Postgrex error and sometimes a DBConnection disconnect for what appears to be the same error?
Why should it disconnect?
Can we stop it from disconnecting (or force a reconnect)?

Now I can retry the transaction further up the stack, but that involves re-doing lots of data processing and doesn’t always work.

We are running latest of db_connection, ecto and postgrex.

Thanks

joey_the_snake · September 19, 2022, 2:29pm

In general Postgrex.Error means your query went through to the database and the response from the database was an error. DBConnection.Error there wasn’t a clean send/receive to/from the db.

In your case it looks like that error comes when you try to execute a query on a transaction that is already rolling back. So this puts the connection in an erroneous state and it raises/disconnects.

A wild guess: you are retrying in the same transaction you receive the error in instead of letting the transaction completely roll back and starting a new one.

nhpip · September 19, 2022, 2:53pm

Thanks, that makes total sense. It was before my time, but the implementation is kind of weird. The pseudo code looks like:

defmodule Server do

  def transaction(mod, fun, args, .....) do
     try do
        Repo.transaction(fn ->
           ....
           case apply(mod, fun, args) do
             {:ok, rsp} -> rsp
             {:error, e} -> Repo.rollback(e)
       end)   
     rescue
       e in Postgrex.Error ->
        repeat_logic_if_serializatiion_error(...)
     end
  end

end

Now the problem is that mod, fun can also invoke Server.transaction, with a different mod, fun. Indeed it can get quite deep. So it’s possible that the exception can get caught and repeated in an inner layer.

Thanks again…