Preferred way to run Ecto Migrations in Kubernetes cluster

ecto
phoenix
migrations
escript

#1

Some environment info

We are hosting our Phoenix application on a Kubernetes cluster using a Dockerfile based on the one in distillery’s documentation.

We are using peerage to discover other nodes of this application, which solves node discovery when dynamically scaling the cluster up or down. This means that the nodes will know about each other so that Phoenix Channels can send events to all nodes.

Other perhaps useful info?

There’s a CICD pipeline for each dev, beta and prod, where in dev we’d want migrations to run automatically while in beta and prod we might want other settings/steps for this.

Hooks don’t work

The deal is though that the previous solution to running ecto migrations was to have a pre-start hook in the rel/config.exs such that it would run these migrations before starting the application. This will however run them each time a node starts, or if they are restarted. Which is not what I want.

We do not want running migrations to be part of any boot-up sequence to minimize boot-up times and to also make sure that we can control it manually, should we need to.

Goal

Find a flexible and easy way to run ecto migrations without adding hooks in the release that would run on each pod start, slowing a pod’s start up time down.

Approach one - Escript

One approach I have considered would be to write an escript that could be compiled and installed locally to manage these tasks, which could then be used in our CD by running the script(s) before deploying the updated docker image.

Approach two - Admin feature

We could implement an internal API that’s restricted to devs where one can start migrations, rollback or whatever. However, it will require that I deploy before I actually can migrate the newest changes, which would mean that until it is deployed and somebody then starts the migrations, there has to be some kind of precautions in all the endpoints checking if migrations has been done, as there might be missing fields in the database before this. Which is why I would prefer the first approach.

Combined

One could also combine these approaches for different cases

Current solutions?

What are the current solutions people use? Are they adaptable to a cluster environment where you don’t always know the host machine (or at least shouldn’t have to know it)?


#2

I’ve always used hooks. If you do rolling updates start up time doesn’t matter and if the migrations are up to date the time to run them will be negligible on your start up time. You’ll also waste a lot of time trying to implement a perfect solution.


#3

Sure, if they’re up to date it won’t be that time consuming.

You’ll also waste a lot of time trying to implement a perfect solution.

I can imagine that it can easily happen, but I am not trying to find a perfect solution. I just feel like I’m not in much control as it becomes part of starting the application and can’t be controlled at all in any other way.

I have not even thought of this yet but if I have to do a rollback, what happens with the migrations that was run by the failed release? I’ll need to look this up… :thinking:


#4

I haven’t deployed a Phoenix app yet or worked much with Kubernetes but is there no way to exec a command on demand to a specific service that’s already running?

This way your CI service in dev can run that exec command to auto-migrate, but in production it wouldn’t be there and then you can choose to run that exec command whenever you want to migrate your DB.

I’ve always decoupled database migrations from my deploy pipeline, because I do think it’s something that should run on demand because each migration isn’t created equal. Sometimes you want to take down the whole app and run a time consuming migration, but other times it’s ok to run a quick migration without taking everything down.


#5

I haven’t deployed a Phoenix app yet or worked much with Kubernetes but is there no way to exec a command on demand to a specific service that’s already running?

There is, but depending on how I’m setting it up it might be a bit tricky. But it is possible to execute something thein the running pod/container and open a remote_console on there connecting it to the server.

I’ve always decoupled database migrations from my deploy pipeline, because I do think it’s something that should run on demand because each migration isn’t created equal. Sometimes you want to take down the whole app and run a time consuming migration, but other times it’s ok to run a quick migration without taking everything down.

This is mainly why I’m looking at alternatives right now.


#6

Which version of distillery are you using right now and may I ask what your hooks look like?

I’m having trouble updating the old hook I tried initially to distillery 2.0. I either get that it tries to start the server all over again, getting an address is already in use error, or it cannot find the database or the schema_migrations table, even though I can clearly see it when I access the database myself. I do not get errors like this in the service itself.


#7

Probably not much help for you here sorry.

Right now I’m using distillery 2 and the one from the docs but I’m not deploying to k8s at the moment.

When I was deploying to k8s I used Distillery 1 and the following:

rel/config.exs

environment :prod do
  #stuff
  set post_start_hook: "post_start" 
end

post_start

set +e

while true; do
  nodetool ping
  EXIT_CODE=$?
  if [ $EXIT_CODE -eq 0 ]; then
    echo "Application is up!"
    break
  fi
done

set -e

echo "Running migrations"
bin/my_app rpc Elixir.Release.Tasks migrate
echo "Migrations run successfully"

lib/release_tasks.ex

defmodule Release.Tasks do
  @moduledoc """
  Handles migrations when releasing
  """
  def migrate do
    {:ok, _} = Application.ensure_all_started(:my_app)

    path = Application.app_dir(:tourswarm, "priv/repo/migrations")

    Ecto.Migrator.run(MyApp.Repo, path, :up, all: true)
  end
end

Is this how you are doing it?


#8

My preferred way to run migrations is to divide them into two buckets.

  1. Schema changes
  2. Data changes

Schema changes are the default migrations in the phoenix app and I setup data changes to run from a separate directory. In K8s, I setup schema changes to run as a pre-init container. In ecto 3, I added the feature that will lock the schema migrations table when migrations are running, so other pre-init containers in other pods will block until migrations have completed. I start a transient process as part of the supervision tree that runs data migrations. If you use the ecto 3 migrator for this, it’ll also ensure that only one of these runs across all pods.

For changes in the first group, they are required to be backwards compatible. That means only additions. Removals are allowed as long as the currently deployed application has already stopped using the thing being removed. Renames are not allowed, but can effectively be done by performing multiple simpler steps. This stuff is done to ensure that during a rolling deploy, the current app and the new app being deployed can run simultaneously.

Data migrations are written in such a way as to not add much load to an application. Also, they’re typically run outside of a transaction. So, they have to be written in such a way that if only part of the migration ran, that is ok and if the migration were to start running again, it would pick up where it left off.

I do these things because I write applications where zero downtime is important. If that’s not a requirement, you could probably get away with something simpler.


#9

This I have experience with from previous projects and this is sort of what I’m aiming for as well. Data changes though, I do not think I’ll have those, but my planned way of handling breaking changes is to make a new version of the resource. So whenever a user then upgrades their client. they will automatically convert it to this new version, or something like that.

I find this intriguing. I have not used Init Containers myself yet but what I wonder is how you make one in your case. Considering that it needs the migrations and all, do you essentially make two containers with the application? One that is meant to run and one for the migrations? What I’m trying to get to here is that they’re essentially, or at least could be, identical except for the command(s) they run?


#10

I use same image and specify the run command in the k8s config.


#11

We only do backwards compat migrations and run them as a kubernetes Job with the “next” version of our docker image.

We do deployments as updates to a CRD “MyApp” using the operator pattern. It runs the Job^1 on CRD update, waits for it to succeeded and then updates the k8s Deployment along with some other stuff like running ElasticSearch reindexes in a similar manner.

I recently released the tool we use internally, Bonny. It allows you to have Operators (CRD lifecycles) written in Elixir.

  1. Technically our migrations are CRDs around jobs. We have DBMigration and DBRollback CRDs so we can easily issue them from outside of an application deployment if need be.

#12

Is this how you are doing it?

Yes, that was exactly how I was going it, but thanks :slight_smile:


#13

This does sound a lot like what I’ve been looking for, so I’ll have a proper look when I have some time :slight_smile: Thanks!


#14

Feel free to ping me if you have any questions. I did a few intro blog posts on it.

I’m working on a basic Phoenix app demo to use in a post to show how we deploy. I’ll hopefully have it posted to medium by next week-ish


#15

Are the source for these available?


#16

Not yet. I’m working them into the demo/tutorial I’m currently writing.


#17

@blatyo sorry I suck. I was writing a blog post and felt like it was mostly about how to use kazan, and decided to try to patch kazan to incorporate some of the higher level k8s interactions, got lost in macroland and ended up writing my own k8s client. :smiley: (Check out my new book: ADHD in Action)

I still plan to write up a blog post on a general “elixir-deployment-operator”, but figured I’d share my notes in the meantime since I left a vaporware comment 4 weeks ago :smiley: