Postgres deadlocks when migrating a sharded db

Hello. We have a sharded database setup where we…

  1. Have many Postgres servers
  2. Each server has one Postgres database
  3. Each database has many schemas (shards)

I wrote a mix task to migrate our shards:

It’s pretty conceptually simple, it starts up each repo (one for each Postgres server/database), then calls the public function Ecto.Migrator.run(repo, migrations, dir, opts), passing in the :prefix (aka schema) as part of the opts.

The problem is that we’ll get these errors intermittently:

18:50:43.593 [info]  alter table schema_foo.form_fields
** (Postgrex.Error) ERROR 40P01 (deadlock_detected) deadlock detected
    hint: See server log for query details.
Process 26072 waits for AccessExclusiveLock on relation 5501621 of database 5489250; blocked by process 17190.
Process 17190 waits for AccessShareLock on relation 5501803 of database 5489250; blocked by process 26072.
    (ecto_sql 3.4.5) lib/ecto/adapters/sql.ex:593: Ecto.Adapters.SQL.raise_sql_call_error/1
    (elixir 1.11.2) lib/enum.ex:1399: Enum."-map/2-lists^map/1-0-"/2
    (ecto_sql 3.4.5) lib/ecto/adapters/sql.ex:686: Ecto.Adapters.SQL.execute_ddl/4
    (ecto_sql 3.4.5) lib/ecto/migration/runner.ex:343: Ecto.Migration.Runner.log_and_execute_ddl/3
    (ecto_sql 3.4.5) lib/ecto/migration/runner.ex:117: anonymous fn/6 in Ecto.Migration.Runner.flush/0
    (elixir 1.11.2) lib/enum.ex:2181: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ecto_sql 3.4.5) lib/ecto/migration/runner.ex:116: Ecto.Migration.Runner.flush/0
    (stdlib 3.13.2) timer.erl:166: :timer.tc/1

Simply rerunning the mix task usually fixes the problem.

Any idea why this happening? Thanks for the help.

1 Like