Ecto.Migrator - as of 3.6.0, it doesn't only lock the schema_migrations table, so I think documentation might be updated

… or prove me wrong :slight_smile:

I look at this part of the code and it seems like now the intention is to lock the REPO, and then run both actual migrating SQL + schema_migrations update in one transaction.

That part definitely got updated from 3.5 → 3.6, and I know because at work I had to grok that part of the code.

However I’m not that advanced in Elixir so I would like some additional feedback.

The code in question is ecto_sql/migrator.ex at v3.6.2 · elixir-ecto/ecto_sql · GitHub

  defp async_migrate_maybe_in_transaction(repo, config, version, module, direction, opts, fun) do
    dynamic_repo = repo.get_dynamic_repo()

    fun_with_status = fn ->
      result = fun.()
      apply(SchemaMigration, direction, [repo, config, version, opts[:prefix]])

    fn -> run_maybe_in_transaction(repo, dynamic_repo, module, fun_with_status) end
    |> Task.async()
    |> Task.await(:infinity)

  defp run_maybe_in_transaction(repo, dynamic_repo, module, fun) do

    if module.__migration__[:disable_ddl_transaction] ||
         not repo.__adapter__().supports_ddl_transaction? do
      {:ok, result} =
        repo.transaction(fun, log: false, timeout: :infinity)

  catch kind, reason ->
    {kind, reason, __STACKTRACE__}

Why I’m writing this, is twofold:

  1. I want to ask if my understanding is correct of how this code operates. We’re using MyXQL library, and was using 3.5.1 ecto_sql. Well lo and behold - DB connection got cut right between finishing migration and updating schema_migrations table! And that seems to match the 3.5.1 code’s logic of these DB operations not in a transaction, so in my company’s case, the schema_migrations table CAN go out of sync.
  • However, with 3.6.2’s async_migrate_maybe_in_transaction update, each migration is paired with a schema_migration in a transaction (via composition in fun_with_status), so hopefully we now rely on MySQL’s transactional guarantees to rollback in the event of DB Connection being cut off.
  1. I don’t quite understand the async side of things. The documentation states the following:

In order to run migrations, at least two database connections are necessary. One is used to lock the “schema_migrations” table and the other one to effectively run the migrations. This allows multiple nodes to run migrations at the same time, but guarantee that only one of them will effectively migrate the database.

  • I don’t think that’s the case anymore. Locking happens at the Repo level, and default pool size is set to 2. All migration+schema_migrations run in pair under transaction, and they are done in async via Task. Therefore it just uses connections alotted in the pool for each migration.

  • But, I’m not quite sure what happens as of 3.6.2 - if it is done in async, and lock as at Repo level, then do these calls just wait until the lock is removed at MySQL’s side? Wouldn’t async not guarantee ordering of migrations?

Please correct me if my understanding is incorrect. I spent the whole day yesterday and will spend today working on this at work, and I’m excited to learn! Love grokking at this core piece of code :smiley:

EDIT: this is the code where SQL operations locks at the REPO level ( I think)

MySQL (the database engine, not the adapter) does not support transactional DDL operations, so run_maybe_in_transaction does not use a transaction:

1 Like

Oh man, that’s not something I wanted to know… :frowning: thanks for the information! I am so used to Postgres I just assumed DDL transaction is supported in MySQL.

Either way, the 3.6.2 code changes for async_migrate_maybe_in_transaction is still increasing likelihood of both operations running in one connection right?

Before, in 3.5.1, the migration operations ran in a Task process while the schema_migrations update ran in another, which seems to guarantee they ran in two separate connections.

I mean, even still, there’s no guarantee of any rollback if one of the two DDL operations fail… so if DB connection disrupts, then we still have the same possible migration inconsistency… hm