Why does this code work as 2 separate migrations, but not as a single one?

iarekk · February 7, 2024, 1:17pm

Hi all, new to Elixir, going through the “Programming Phoenix LiveView” book right now. Fairly new to working with Ecto as well

An exercise in chapter instructs the reader to add a username field to the User struct, and implement the Ecto migrations to support that.

Additionally, I would like to make the username:

mandatory
unique
populate email as default value for existing DB records.

I’ve achieved it successfully with the following migrations:

First migration: create column:

  def change do
    alter(table(:users)) do
      add :username, :citext
    end
  end

Second migration, fill column and make it unique/mandatory:

  def change do
    from(u in Pento.Accounts.User, update: [set: [username: u.email]])
    |> Pento.Repo.update_all([])

    alter(table(:users)) do
      modify :username, :citext, null: false
    end

    create(unique_index(:users, :username))
  end

However, I can’t seem to be able to make these updates as a single migration. Why?

My first draft looked like this:

defmodule Pento.Repo.Migrations.CreateUsernameColumn do
  use Ecto.Migration
  import Ecto.Query, only: [from: 2]

  def change do
    alter(table(:users)) do
      add :username, :citext
    end

    from(u in Pento.Accounts.User, update: [set: [username: u.email]])
    |> Pento.Repo.update_all([])

    alter(table(:users)) do
      modify :username, :citext, null: false
    end

    create(unique_index(:users, :username))
  end
end

… but it throws an error in the from(u in... statement as Users doesn’t have the column called username. But I just created that column 2 rows above. What am I missing?

P.S. If it helps, complete repo here: Ch2/add username field by iarekk · Pull Request #7 · iarekk/programming_phoenix_liveview · GitHub

ken-kost · February 7, 2024, 1:32pm

change/0 is a simplification of up and down migration. Note that you want to be able to reverse the migration also. In any case, I’m not sure, but I think change is executed like a single transaction, meaning the first alter table wasn’t actually written until everything under succeeded, which it didn’t because the first alter table wasn’t written yet.
edit: catch-22?

sodapopcan · February 7, 2024, 1:46pm

It very well might be transaction related (everything in change is indeed run inside one). If you ever need to commit a transaction in a migration you can use the flush() function. However, @iarekk, you really shouldn’t put schema names in migrations. If the schemas ever change in a way that older migrations don’t expect, they will break. It’s actually best to keep data migrations complete out of migrations and use a different solution but if you do, they should be written as raw SQL: execute("update users u set username = u.email") and best written as separate migrations anyway so they can be deleted once they’ve been run in production.

LostKobrakai · February 7, 2024, 1:57pm

SQL works fine, but for completeness one can write ecto queries without depending on schemas as well. from u in "users", update: [set: [username: u.email]] should do just fine.

iarekk · February 7, 2024, 5:59pm

Thank you!

flush() works if I write the migration like that:

def change do
    alter(table(:users)) do
      add :username, :citext
    end

    flush()

    from(u in Pento.Accounts.User, update: [set: [username: u.email]])
    |> Pento.Repo.update_all([])

    alter(table(:users)) do
      modify :username, :citext, null: false
    end

    create(unique_index(:users, :username))
  end

Fair point on the schemas - unlike migrations, the schema object exists as the ‘latest’ version. I’ll need to clean up the code – really liking @LostKobrakai’s suggestion there.

On the topic of data updates and migrations, I’m not sure I understand.

AFAIK the following should be true (assume we have migrations m1..mN):

Given an empty server, executing migrations m1..mN should give you a working database.
Given a server with migrations m1..mK already applied, applying mK+1..mN should give you a working database.

Now, if migration k introduces a column that can’t be null, it should also:

fill it with some data
OR specify a default

So I can’t really see how I could separate ‘schema updates’ from ‘data updates’ unless I sacrifice property 2, and make some mK into dead ends. By dead end I mean, that you have to reset your DB and start from scratch.

LostKobrakai · February 7, 2024, 6:12pm

Working database can mean many things. E.g. you can do “deploy:create new column, don’t depend on it”, “backfill new column”, “deploy:start using column”.

sodapopcan · February 7, 2024, 6:21pm

Unfortunately the usual way around it is multiple deploys, ie, what @LostKobrakai is saying in terms of “don’t depend on it” → “start using column”. It can get super annoying but was just pointing it out as something to think about going forward. It all depends on your production setup too but you can end up in a situation where you’ve backfilled the columns but you still have the old code running on a server that tries to create a new record that doesn’t even know about the new column which would crash. Best case scenario here is that the user gets annoyed and tries again but depending what data you are migrating it could be much worse.

Anyway, it’s a bit beyond the scope of the book likely and if you are still in development this doesn’t matter, I just wanted to point it out.

iarekk · February 7, 2024, 6:22pm

OH. Right!

My approach until now has been: “Given a list of migrations and a database with schema K and with data, the migrations should bring the database to schema N plus consistent data”

Whereas I could just as easily say “Given an empty database with schema K, the migrations should bring the database to schema N”, and let the particular DB owner perform their own data manipulations.

The former feels way more shiny, but perhaps I’m asking too much from the migrations scripts

LostKobrakai · February 7, 2024, 6:26pm

You can still run data migrations on your own. The important difference with separated data migrations is that they’re not part of deployments and are therefore less risky. They can take time, the can run in batches, they can run over night, they can be aborted, reset, all those things, which are a hassle while doing a deployment.

iarekk · February 7, 2024, 6:39pm

Yeah! I didn’t think that through for reasons of:

Having a DBA team
Working with DynamoDB for most of our data needs.

Thank you for your help! I understand things a bit better now