Squashing Schema Migrations

fireproofsocks · October 18, 2019, 9:16pm

Anyone who has worked on an old code-base is familiar with this problem: the migrations folder can get huge. The database your app started with may bear little resemblance to the database it uses now.

I’ve been reading about how to “squash” migrations in other frameworks… the idea is pretty simple: take a single snapshot of your database schema as the new starting point and trash all of your old migrations. It reminds me of the “squash and merge” option when you close a Github pull request.

How would one do this in Elixir and Ecto? Has anyone written an article or documentation page on this perhaps?

Thanks!

apr · October 18, 2019, 9:34pm

I have not tried this, but you can maybe dump the ecto schema using mix ecto.dump and load it using mix ecto.load and run new migrations from then onwards?

seanwash · October 18, 2019, 9:58pm

@fireproofsocks That’s what I’m doing over here.

holgerwiehen · October 18, 2019, 11:13pm

You might find this blog post interesting: https://ulisses.dev/elixir/2019/05/13/how-to-create-a-ecto-setup-pipeline-with-ecto-3-1-2.html

kelvinst · October 20, 2019, 3:09pm

One question though: what is the main intention of doing that? To reduce the build time by cutting off some steps and going straight to the final db state? Or to remove he migration files at all?

I ask because I think the second option is not reasonable. Having the old files there might help you in the future for some reason, like knowing how some data got o the state it is for example. Anyway, doing the dump and load strategy you would be able to still leave the migrations files there and it will totally work.

fireproofsocks · October 21, 2019, 4:09pm

The reason is mostly what I would dub “housekeeping”. I’m remembering one repo in particular I worked on that had so many migrations that I had to remember not to open that folder because it had so many files it could crash the IDE. Another more common scenario arises from sloppy migrations (I’ve been guilty!) that worked for the immediate updates but somehow broke the ability to replay all the migrations from start to finish.

If historical transparency is required, I’m happy to leave that to Git. I just want to be able to reset things to a simpler representation.

manuel-rubio · March 22, 2020, 11:02am

Depends on the changes. Usually, when you have to go into a legacy code you have to run the migrations from scratch. Happens sometimes the migrations contains some data migration because the structure changed and you have to convert that data. If you used part or something from the schema files and if future revisions that changed again, you’ll discover you cannot run that from scratch anymore.

chrisjowen · March 24, 2020, 2:31am

That in itself is an interesting problem… Where to do data migrations. Unlike schema migrations where the state is know by applying previous migrations, data migrations are really usually only relevant at there execution time andgenerally are useless after usage. Currently I’m just creating throw away mix tasks for such things but I feel I’m missing a trick here, though really don’t think that normal schema migrations are the place for these kind of things

1player · March 24, 2020, 2:26pm

I keep data migrations with the schema migrations themselves. There is no logical difference between updating a database structure and updating its data.

The idea is that I can go back to an old commit, load an old database dump or simply starting from scratch with mix do ecto.create, ecto.migrate and I have a running application.

It’s perfectly fine to run a Repo.update in a migration instead of some alter table.

LostKobrakai · March 24, 2020, 5:29pm

It might work, but I wouldn’t call it perfectly fine. Starting your repo might depend on a certain db state, which your migration is supposed to create, but didn’t yet. This is kind of a circular dependency. Imo migrations should rather use execute(sql) and be able to run without dependencies.

1player · March 24, 2020, 5:41pm

Yes and no: the point of managing data migrations with schema migrations is that they’re ordered and have a timestamp, so unless there’s something very wrong, migration B runs after migration A has taken place, and cannot depend on migration C.

Of course, any data migration must be able to work with any data, whether a table is empty or full of rows to update.

EDIT: re-reading your comment I think I understand what you mean: the actual app cannot start if it requires some database state. That might happen, though it’s rare in practice, at least in the apps I have developed.

LostKobrakai · March 24, 2020, 5:57pm

Imagine me adding a new field to MySchema and some other changes.

If I do Repo.update_all(MySchema, …) in the migration before the field is added it’ll crash.
One could say it’s fine to use Repo just with table names, but then again you’re still starting your whole application just to have slightly more convenient api around queries.

TylerSustare · April 28, 2022, 12:24am

It’s essentially the reason behind this for rails if I’m understanding correctly. GitHub - jalkoby/squasher: Squasher - squash your old migrations in a single command

cblavier · June 1, 2022, 7:26am

This approach seems reasonable, I will try it to squash all my migrations before 2022:

I’ll let you know how it goes

crova · June 1, 2022, 7:57am

Came to post this here.
We use this approach on our project and it works nicely to us.

cblavier · June 1, 2022, 10:42am

Everything went smoothy, here is how I did it to squash everything until 2021.

MIX_ENV=test mix ecto.drop
MIX_ENV=test mix ecto.create
MIX_ENV=test mix ecto.migrate --to 20211223095738 # last 2021 migration
MIX_ENV=test mix ecto.dump -d priv/repo/schema-squash.sql
mkdir priv/repo/migrations_archive
for i in {2016..2021}; do mv repo/migrations/$i* priv/repo/migrations_archive; done
mix ecto.setup

And here is my mix alias in mix.exs:

defp aliases do
  [
     ...,
     "ecto.setup": [
        "ecto.create",
        "ecto.load -d priv/repo/schema-squash.sql -f --skip-if-loaded",
        "ecto.migrate"
      ]
  ]
end

EDITED after @LostKobrakai hint

LostKobrakai · June 1, 2022, 10:58am

Why not use mix ecto.load for loading the sql instead of manually doing so though psql?

cblavier · June 1, 2022, 11:32am

Maybe because I wasn’t aware of mix ecto.load

cblavier · June 1, 2022, 11:54am

I just edited my code snippet

manuel-rubio · December 11, 2022, 1:45pm

Adding more from @whatyouhide to this conversation Get Rid of Your Old Database Migrations – Andrea Leopardi