Squashing Schema Migrations

Anyone who has worked on an old code-base is familiar with this problem: the migrations folder can get huge. The database your app started with may bear little resemblance to the database it uses now.

I’ve been reading about how to “squash” migrations in other frameworks… the idea is pretty simple: take a single snapshot of your database schema as the new starting point and trash all of your old migrations. It reminds me of the “squash and merge” option when you close a Github pull request.

How would one do this in Elixir and Ecto? Has anyone written an article or documentation page on this perhaps?

Thanks!

I have not tried this, but you can maybe dump the ecto schema using mix ecto.dump and load it using mix ecto.load and run new migrations from then onwards?

4 Likes

@fireproofsocks That’s what I’m doing over here.

You might find this blog post interesting: https://ulisses.dev/elixir/2019/05/13/how-to-create-a-ecto-setup-pipeline-with-ecto-3-1-2.html

2 Likes

One question though: what is the main intention of doing that? To reduce the build time by cutting off some steps and going straight to the final db state? Or to remove he migration files at all?

I ask because I think the second option is not reasonable. Having the old files there might help you in the future for some reason, like knowing how some data got o the state it is for example. Anyway, doing the dump and load strategy you would be able to still leave the migrations files there and it will totally work.

The reason is mostly what I would dub “housekeeping”. I’m remembering one repo in particular I worked on that had so many migrations that I had to remember not to open that folder because it had so many files it could crash the IDE. Another more common scenario arises from sloppy migrations (I’ve been guilty!) that worked for the immediate updates but somehow broke the ability to replay all the migrations from start to finish.

If historical transparency is required, I’m happy to leave that to Git. I just want to be able to reset things to a simpler representation.

3 Likes

Depends on the changes. Usually, when you have to go into a legacy code you have to run the migrations from scratch. Happens sometimes the migrations contains some data migration because the structure changed and you have to convert that data. If you used part or something from the schema files and if future revisions that changed again, you’ll discover you cannot run that from scratch anymore.

That in itself is an interesting problem… Where to do data migrations. Unlike schema migrations where the state is know by applying previous migrations, data migrations are really usually only relevant at there execution time andgenerally are useless after usage. Currently I’m just creating throw away mix tasks for such things but I feel I’m missing a trick here, though really don’t think that normal schema migrations are the place for these kind of things

I keep data migrations with the schema migrations themselves. There is no logical difference between updating a database structure and updating its data.

The idea is that I can go back to an old commit, load an old database dump or simply starting from scratch with mix do ecto.create, ecto.migrate and I have a running application.

It’s perfectly fine to run a Repo.update in a migration instead of some alter table.

1 Like

It might work, but I wouldn’t call it perfectly fine. Starting your repo might depend on a certain db state, which your migration is supposed to create, but didn’t yet. This is kind of a circular dependency. Imo migrations should rather use execute(sql) and be able to run without dependencies.

Yes and no: the point of managing data migrations with schema migrations is that they’re ordered and have a timestamp, so unless there’s something very wrong, migration B runs after migration A has taken place, and cannot depend on migration C.

Of course, any data migration must be able to work with any data, whether a table is empty or full of rows to update.

EDIT: re-reading your comment I think I understand what you mean: the actual app cannot start if it requires some database state. That might happen, though it’s rare in practice, at least in the apps I have developed.

Imagine me adding a new field to MySchema and some other changes.

If I do Repo.update_all(MySchema, …) in the migration before the field is added it’ll crash.
One could say it’s fine to use Repo just with table names, but then again you’re still starting your whole application just to have slightly more convenient api around queries.