simpers

Preferred way to run Ecto Migrations in Kubernetes cluster

Some environment info

We are hosting our Phoenix application on a Kubernetes cluster using a Dockerfile based on the one in distillery’s documentation.

We are using peerage to discover other nodes of this application, which solves node discovery when dynamically scaling the cluster up or down. This means that the nodes will know about each other so that Phoenix Channels can send events to all nodes.

Other perhaps useful info?

There’s a CICD pipeline for each dev, beta and prod, where in dev we’d want migrations to run automatically while in beta and prod we might want other settings/steps for this.

Hooks don’t work

The deal is though that the previous solution to running ecto migrations was to have a pre-start hook in the rel/config.exs such that it would run these migrations before starting the application. This will however run them each time a node starts, or if they are restarted. Which is not what I want.

We do not want running migrations to be part of any boot-up sequence to minimize boot-up times and to also make sure that we can control it manually, should we need to.

Goal

Find a flexible and easy way to run ecto migrations without adding hooks in the release that would run on each pod start, slowing a pod’s start up time down.

Approach one - Escript

One approach I have considered would be to write an escript that could be compiled and installed locally to manage these tasks, which could then be used in our CD by running the script(s) before deploying the updated docker image.

Approach two - Admin feature

We could implement an internal API that’s restricted to devs where one can start migrations, rollback or whatever. However, it will require that I deploy before I actually can migrate the newest changes, which would mean that until it is deployed and somebody then starts the migrations, there has to be some kind of precautions in all the endpoints checking if migrations has been done, as there might be missing fields in the database before this. Which is why I would prefer the first approach.

Combined

One could also combine these approaches for different cases

Current solutions?

What are the current solutions people use? Are they adaptable to a cluster environment where you don’t always know the host machine (or at least shouldn’t have to know it)?

16 comments

/phoenix #ecto #migrations #escript

21 5665 16

2019-02-05 19:52:11 UTC

Most Liked

blatyo

Conduit Core Team

My preferred way to run migrations is to divide them into two buckets.

Schema changes
Data changes

Schema changes are the default migrations in the phoenix app and I setup data changes to run from a separate directory. In K8s, I setup schema changes to run as a pre-init container. In ecto 3, I added the feature that will lock the schema migrations table when migrations are running, so other pre-init containers in other pods will block until migrations have completed. I start a transient process as part of the supervision tree that runs data migrations. If you use the ecto 3 migrator for this, it’ll also ensure that only one of these runs across all pods.

For changes in the first group, they are required to be backwards compatible. That means only additions. Removals are allowed as long as the currently deployed application has already stopped using the thing being removed. Renames are not allowed, but can effectively be done by performing multiple simpler steps. This stuff is done to ensure that during a rolling deploy, the current app and the new app being deployed can run simultaneously.

Data migrations are written in such a way as to not add much load to an application. Also, they’re typically run outside of a transaction. So, they have to be written in such a way that if only part of the migration ran, that is ok and if the migration were to start running again, it would pick up where it left off.

I do these things because I write applications where zero downtime is important. If that’s not a requirement, you could probably get away with something simpler.

Post #8

coryodaniel

We only do backwards compat migrations and run them as a kubernetes Job with the “next” version of our docker image.

We do deployments as updates to a CRD “MyApp” using the operator pattern. It runs the Job^1 on CRD update, waits for it to succeeded and then updates the k8s Deployment along with some other stuff like running ElasticSearch reindexes in a similar manner.

I recently released the tool we use internally, Bonny. It allows you to have Operators (CRD lifecycles) written in Elixir.

Technically our migrations are CRDs around jobs. We have DBMigration and DBRollback CRDs so we can easily issue them from outside of an application deployment if need be.

Post #11

sanswork

I’ve always used hooks. If you do rolling updates start up time doesn’t matter and if the migrations are up to date the time to run them will be negligible on your start up time. You’ll also waste a lot of time trying to implement a perfect solution.

Post #2