Change my mind: Migrations in a start phase

Overbryd · October 24, 2019, 4:22pm

# in my mix.exs

  def application do
    [
      mod: {My.Application, []},
      start_phases: [{:migrate, []}],
      extra_applications: [:logger, :runtime_tools]
    ]
  end

# in lib/my/application.ex

  def start_phase(:migrate, _, _) do
    Ecto.Migrator.with_repo(My.Repo, &Ecto.Migrator.run(&1, :up, all: true))
    :ok
  end

Works everytime™

benwilson512 · October 24, 2019, 4:35pm

Interesting!

I think the main downside compared to having the migrator in your supervision tree is that it doesn’t let you control where the migration happens relative to other items in your supervision tree. For example, we have basically:

[
        libcluster_child(),
        Sensetra.Endpoint,
        {Absinthe.Subscription, Sensetra.Endpoint},
        Sensetra.Repo,
        Sensetra.Repo.Migrator,
        Sensetra.Ingestion.Super,
        ... other children
        DeploymentNotifier
]

This is important because it allows the Ingestion.Super process to be sure that any database changes it relies on have definitely have happened by that point because the Migrator has run.

You’ll also note that I start the Endpoint pretty early. The way that works is that the /alive path returns true, but the /ready path returns false. This lets Kubernetes know that the pod is alive and running, but is not yet ready to receive traffic. It can take its time to run migrations, get the process tree up and running, and then the DeploymentNotifier child sets an application environment value such that /ready returns true.

blatyo · October 25, 2019, 3:32am

One reason would be long running migrations, which effectively block each instance of your app from starting up.

Which is not necessarily to say don’t do it. But, be careful what you do there. Some migrations are better to run live, like stuff that transforms data, fills in a column with a default, etc.

Overbryd · November 22, 2019, 11:04am

Awesome answer, thank you.

Having a sensible split between /ready and /alive is a massive improvement and a great example on why to keep migrations situated in the supervision tree.

Overbryd · November 22, 2019, 11:09am

I tend to avoid doing long running migrations as a ORM-driven migration.

My approach there has always been to rebuild tables and once they are ready to swap them instantaneously. Doing that in the database itself, rather than coupling any application code to it.

I tend to think that the classical DBA approach sometimes has some value, as it is decoupling responsibilities from application development into database management.
And I do not think one should always hire a DBA, but just splitting those into separate tasks is usually de-escalating some heavier work on the database into its own domain. The application can keep running happily while the database is being worked on separately.

For example migrations in the supervision tree do not shield you from the problem, that your liveness-probe times out because of a long running migration.

benwilson512 · November 22, 2019, 1:33pm

Yeah I also avoid long running migrations. Anything that could take a long time I always just find a backwards compatible way to do and then I do it via psql and let it take however long it needs while production carries on happily. Afterward I write an idempotent migration mostly to keep everyone’s dev / test environments up to date, and then deploy that, which should migrate instantly.

idi527 · April 16, 2020, 3:24pm

Hi @benwilson512, sorry for replying in an old thread, but I wonder what Sensetra.Repo.Migrator actually looks like, is it like a genserver that runs the migrations in its init?

I’m asking because I’ve tried a similar approach but with running the migrations in a task like

  def start(_type, _args) do
    migration = fn ->
      :timer.sleep(10000)
      IO.puts("Migrated")
    end

    readiness_notifier = fn ->
      IO.puts("ready to accept requests")
    end

    children = [
      Supervisor.child_spec({Task, migration}, id: :migration),
      Supervisor.child_spec({Task, readiness_notifier}, id: :readiness_notifier)
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: SupOrder.Supervisor]
    Supervisor.start_link(children, opts)
  end

and they ran (obviously in retrospect) concurrently:

> iex -S mix
Erlang/OTP 22 [erts-10.7.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace]

Compiling 2 files (.ex)
Generated sup_order app
ready to accept requests
Interactive Elixir (1.10.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Migrated

So I guess it’s more like this

  defmodule Migration do
    use GenServer

    def start_link(opts) do
      GenServer.start_link(__MODULE__, opts)
    end

    def init(_opts) do
      :timer.sleep(10000)
      IO.puts("Migrated")
      :ignore
    end
  end

  def start(_type, _args) do
    readiness_notifier = fn ->
      IO.puts("ready to accept requests")
    end

    children = [
      Migration,
      Supervisor.child_spec({Task, readiness_notifier}, id: :readiness_notifier)
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: SupOrder.Supervisor]
    Supervisor.start_link(children, opts)
  end

which result in the correct order

> iex -S mix
Erlang/OTP 22 [erts-10.7.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace]

Compiling 1 file (.ex)
Migrated
ready to accept requests
Interactive Elixir (1.10.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>