DB record commited in Repo.transaction is sometimes not immediately able to be queried back in a new spawned process

bruteforcecat · March 17, 2020, 11:17am

So, the short question is in what scenario will a db record created in Repo.transaction will not be visible by another process given that the query is made after the transaction is done.

Longer version:
I am having a weird issues where it only happen rarely and so far I can’t reproduce it in any simple setup in my local env. But the prod code can be simplified roughly as below,

ecto_multi
|> Repo.transaction()
|> case do
   {:ok, %{id: id}} ->
      spawn(fn -> do_some_not_critical_thing(id) end)
end

def do_some_not_critical_thing(id) do
   # here we are simply querying the record using the id
end

So in the production, we sometime(probably 1 out of 100) get error complaining the record is not found with that id. After spending quite some time researching about it, my best guess is the new spawned connection is using an existing db connection and that db connection is doing work that’s under transaction block ( BEGIN) before the new record is created and then select query is somehow “put” inside the transaction block and thus the newly created db record in parent process(another db connection) is not visible to the new process db connection?

I wonder if anyone had this similar issue before?

dimitarvp · March 17, 2020, 11:39am

This probably doesn’t answer your question directly but have you tried to just use the second element of the tuple you get in response? That should contain the record you are interested in (if it’s only one thing that the Ecto.Multi inserts/updates; otherwise they should be several).

Additionally, is this happening in your tests, or while you are fiddling in iex?

bruteforcecat · March 17, 2020, 12:21pm

Thank you for your reply. Yea for this particular case, I can pass the whole struct instead of querying again in the new process. But I am just curious what’s the root cause of it.

I currently only sometime have this issue in production and I am using Postgres and mix release for bunding runtime

dimitarvp · March 17, 2020, 12:33pm

I admit I am at a loss. I’d expect such a basic thing to work without any need for tine-tuning indeed.

benwilson512 · March 17, 2020, 1:47pm

Are you able to create a reproduceable example? If the transaction has committed, the record is was written. If the record is not there, you’re either somehow running the spawned function before the transaction has committed, or the record has been removed.

bruteforcecat · March 17, 2020, 1:56pm

I was trying to make reproducible setup in local but I failed it. And I was 100% sure the ecto query call in new spawn process is right after Repo.transaction and the record did insert into DB.(I match it for :ok tuple case and also able to get the record back when I debug it in remote console

1player · March 17, 2020, 3:58pm

Perhaps related to your database transaction isolation?

If the two processes don’t share the same database connection and transaction isolation is READ COMMITTED, the new row might not be available until commit time/end of the Repo.transaction closure.

A simple test would be to open two SQL clients, start two transactions, run some updates on one and make sure they’re visible or not on the other transaction.

EDIT: your code above should work as expected, unless it’s in a transaction itself. A top level Repo.transaction and query immediately after will work.

Would help knowing which database, version and adapter version you’re using.

benwilson512 · March 17, 2020, 4:03pm

Ah this is a great observation. If the full code block that @bruteforcecat posted is itself called from within an outer Repo.transaction, then the inner Repo.transaction call does NOT commit.

bruteforcecat · March 17, 2020, 4:03pm

You are right. I think that’s the most likely cause but the weird thing is the ecto query is called in a new process and this is the only DB call so even the new process checkout a different connection I still don’t understand how it could end up being in a transaction that Begin earlier than the the parent transaction commit. As you can see the sample code above, the spawn called is clearly called after Repo.transaction

1player · March 17, 2020, 4:06pm

This will work:

ecto_multi
|> Repo.transaction()
|> case do
   {:ok, %{id: id}} ->
      # Here the row HAS TO BE available
      spawn(fn -> 
        # Available here as well
      end)
end

but if nested might not work as intended:

Repo.transaction(fn -> 
  ecto_multi
  |> Repo.transaction()
  |> case do
     {:ok, %{id: id}} ->
        # Here the row will be available (not 100% sure)
        spawn(fn ->
          # Here the row MIGHT NOT BE available, unless ISOLATION LEVEL = READ UNCOMMITTED
        end)
  end
end)

EDIT: in fact seems like the READ UNCOMMITTED workaround might be impossible on PostgreSQL: “PostgreSQL’s Read Uncommitted mode behaves like Read Committed. This is because it is the only sensible way to map the standard isolation levels to PostgreSQL’s multiversion concurrency control architecture.” — So PostgreSQL will always return committed data on another connection.

wanton7 · March 17, 2020, 4:10pm

From docs Ecto.Repo — Ecto v3.11.1

If transaction/2 is called inside another transaction, the function is simply executed, without wrapping the new transaction call in any way. If there is an error in the inner transaction and the error is rescued, or the inner transaction is rolled back, the whole outer transaction is marked as tainted, guaranteeing nothing will be committed.

bruteforcecat · March 18, 2020, 1:09am

I checked our code again. There is some places that’s calling the function inside Multi.run so I think that’s the root cause. Thank you.