Ecto deadlocks when running tests on Elixir 1.10

I have an app using ecto 3.3.2, ecto_sql 3.3.3, and postgrex 0.15.3. Everything’s been fine under Elixir 1.9.4 (OTP 22), but when I try to run my test suite under Elixir 1.10.0 or 1.10.11 (also OTP 22), I get the following error intermittently:

     ** (Postgrex.Error) ERROR 40P01 (deadlock_detected) deadlock detected

         hint: See server log for query details.

     Process 23951 waits for ShareLock on transaction 6852; blocked by process 23950.
     Process 23950 waits for ShareLock on transaction 6867; blocked by process 23951.

The server log says:

ERROR:  deadlock detected
DETAIL:  Process 23951 waits for ShareLock on transaction 6852; blocked by process 23950.
Process 23950 waits for ShareLock on transaction 6867; blocked by process 23951.
Process 23951: INSERT INTO "works" ("accession_number","administrative_metadata","descriptive_metadata","published","visibility","work_type","inserted_at","updated_at") VALUES ($1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id"
Process 23950: INSERT INTO "works" ("accession_number","administrative_metadata","descriptive_metadata","published","visibility","work_type","inserted_at","updated_at") VALUES ($1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id"
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (0,13) in relation "works_accession_number_index"
STATEMENT:  INSERT INTO "works" ("accession_number","administrative_metadata","descriptive_metadata","published","visibility","work_type","inserted_at","updated_at") VALUES ($1,$2,$3,$4,$5,$6,$7,$8) RETURNING "id"

It’s not every time, and it’s not always on the same test(s). Has anyone else noticed any differences between Elixir 1.9 and 1.10 in this regard? Given that the error involves a transaction, and we use very few of them in our application code, I suspect this might have something to do with the Ecto SQL Sandbox?

MBK

4 Likes

Still having this issue with Elixir 1.10.2, so I’m giving the topic a bump to see if anyone has any ideas. (I also just noticed the typo in the OP that turned 1.10.1 into 1.10.11, but oh well.)

1 Like

On my machine Ecto’s compilation occasionally deadlocks. :frowning: I am on Elixir 1.10.2-otp-22 and Erlang 22.3.2.

Confirmed that this issue persists under Elixir 1.10.3 (OTP 22) + Ecto 3.4.3 + Postgrex 0.15.3.

Please see this issue. If you can get a crash dump, it would be VERY appreciated.

Your deadlock is coming from the database. In this case, the Elixir version shouldn’t be the root case, perhaps it just pokes the database in a slightly different way for the deadlock to be more common. You probably want to make sure you are using unique values for all unique indexes, please see the docs for more info.

3 Likes

Thanks. I couldn’t figure out why the Elixir version would be an issue, but I mentioned it prominently since it seems to be the only consistent difference between my “never deadlocks” and “sometimes deadlocks” runs. I’ll look into whether changing the test data might help.

Problem is, I never got to a crash dump. :confused: I’ll retry several of my projects and see if I can make the runtime crash by waiting for an hour or more. Most I waited was about 10 minutes (and believe me, on my machine Ecto’s 50+ files usually compile in 1-2 secs!) but never got any output. I’ll see if I can change that.

EDIT: I’ll also try sending SIGUSR1 as described in the issue.

Yeah, don’t wait for a crash dump, use an ABORT signal at the OS level or a crash dump signal from the break menu. See comment here: https://github.com/elixir-lang/elixir/issues/9980#issuecomment-620613677

Just did that, 3 times, no crash dump and I tried both methods. :frowning: I’ll retry sometime later.

Thanks for this, José. I changed the setup to add a random prefix to our fixtures’ unique fields and all is well.

1 Like