DB record commited in Repo.transaction is sometimes not immediately able to be queried back in a new spawned process

So, the short question is in what scenario will a db record created in Repo.transaction will not be visible by another process given that the query is made after the transaction is done.

Longer version:
I am having a weird issues where it only happen rarely and so far I can’t reproduce it in any simple setup in my local env. But the prod code can be simplified roughly as below,

ecto_multi
|> Repo.transaction()
|> case do
   {:ok, %{id: id}} ->
      spawn(fn -> do_some_not_critical_thing(id) end)
end

def do_some_not_critical_thing(id) do
   # here we are simply querying the record using the id
end

So in the production, we sometime(probably 1 out of 100) get error complaining the record is not found with that id. After spending quite some time researching about it, my best guess is the new spawned connection is using an existing db connection and that db connection is doing work that’s under transaction block ( BEGIN) before the new record is created and then select query is somehow “put” inside the transaction block and thus the newly created db record in parent process(another db connection) is not visible to the new process db connection?

I wonder if anyone had this similar issue before?

This probably doesn’t answer your question directly but have you tried to just use the second element of the tuple you get in response? That should contain the record you are interested in (if it’s only one thing that the Ecto.Multi inserts/updates; otherwise they should be several).

Additionally, is this happening in your tests, or while you are fiddling in iex?

Thank you for your reply. Yea for this particular case, I can pass the whole struct instead of querying again in the new process. But I am just curious what’s the root cause of it.

I currently only sometime have this issue in production and I am using Postgres and mix release for bunding runtime

I admit I am at a loss. I’d expect such a basic thing to work without any need for tine-tuning indeed.

Are you able to create a reproduceable example? If the transaction has committed, the record is was written. If the record is not there, you’re either somehow running the spawned function before the transaction has committed, or the record has been removed.

I was trying to make reproducible setup in local but I failed it. And I was 100% sure the ecto query call in new spawn process is right after Repo.transaction and the record did insert into DB.(I match it for :ok tuple case and also able to get the record back when I debug it in remote console

Perhaps related to your database transaction isolation?

If the two processes don’t share the same database connection and transaction isolation is READ COMMITTED, the new row might not be available until commit time/end of the Repo.transaction closure.

A simple test would be to open two SQL clients, start two transactions, run some updates on one and make sure they’re visible or not on the other transaction.

EDIT: your code above should work as expected, unless it’s in a transaction itself. A top level Repo.transaction and query immediately after will work.

Would help knowing which database, version and adapter version you’re using.

3 Likes

Ah this is a great observation. If the full code block that @bruteforcecat posted is itself called from within an outer Repo.transaction, then the inner Repo.transaction call does NOT commit.

You are right. I think that’s the most likely cause but the weird thing is the ecto query is called in a new process and this is the only DB call so even the new process checkout a different connection I still don’t understand how it could end up being in a transaction that Begin earlier than the the parent transaction commit. As you can see the sample code above, the spawn called is clearly called after Repo.transaction

This will work:

ecto_multi
|> Repo.transaction()
|> case do
   {:ok, %{id: id}} ->
      # Here the row HAS TO BE available
      spawn(fn -> 
        # Available here as well
      end)
end

but if nested might not work as intended:

Repo.transaction(fn -> 
  ecto_multi
  |> Repo.transaction()
  |> case do
     {:ok, %{id: id}} ->
        # Here the row will be available (not 100% sure)
        spawn(fn ->
          # Here the row MIGHT NOT BE available, unless ISOLATION LEVEL = READ UNCOMMITTED
        end)
  end
end)

EDIT: in fact seems like the READ UNCOMMITTED workaround might be impossible on PostgreSQL: “PostgreSQL’s Read Uncommitted mode behaves like Read Committed. This is because it is the only sensible way to map the standard isolation levels to PostgreSQL’s multiversion concurrency control architecture.” — So PostgreSQL will always return committed data on another connection.

7 Likes

From docs Ecto.Repo — Ecto v3.11.1

If transaction/2 is called inside another transaction, the function is simply executed, without wrapping the new transaction call in any way. If there is an error in the inner transaction and the error is rescued, or the inner transaction is rolled back, the whole outer transaction is marked as tainted, guaranteeing nothing will be committed.

1 Like

I checked our code again. There is some places that’s calling the function inside Multi.run so I think that’s the root cause. Thank you. :pray:t4:

4 Likes