How to generate index migration queries from resource at runtime?

sezaru · November 14, 2023, 12:59am

I have a system that needs to bulk load a bunch of data from time to time.

One way to speed that load process is to delete some of the indexes for that table an then, after the load is done, re-create it.

I can do that manually via psql, but an Ash Resource already has all the information about the index, so I was wondering if there is some way for me to generate, at runtime, the same queries ash_postgres.generate_migrations generates for the resource indexes.

In other words, I want to generate create index ... and drop index ... queries for each index and identity I have in my resource so I can execute it at runtime using Repo.query

zachdaniel · November 14, 2023, 3:55am

I have no idea if this is really what you’ll end up wanting, but we do have some tooling around this kind of thing. What you can do is this:

snapshot =
  YourApi
  |> AshPostgres.MigrationGenerator.take_snapshots(YourRepo, [TheResource])
  |> Enum.at(0)

snapshot_without_indices =
  some_custom_code_to_remove_indices

operations = AshPostgres.MigrationGenerator.get_operations_from_snapshots([snapshot], [snapshot_without_indices])

{up, down} = AshPostgres.MigrationGenerator.build_up_and_down(operations)

# do something with up/down here

Thinking about this, I don’t see how you could really do what you want that way…we generate ecto migration code. You’d have to define an ecto migration module and run it. If thats on the table then you could use up to remove everything and down to add them back

sezaru · January 4, 2024, 6:04pm

Sorry, I only was able to come back to this today…

So, when I run the AshPostgres.MigrationGenerator.get_operations_from_snapshots I get the following error:

iex(114)> operations = AshPostgres.MigrationGenerator.get_operations_from_snapshots([snapshot], [snapshot_without_indices])
** (FunctionClauseError) no function clause matching in String.to_atom/1    
    
    The following arguments were given to String.to_atom/1:
    
        # 1
        :unique_full_name
    
    Attempted function clauses (showing 1 out of 1):
    
        def to_atom(string) when is_binary(string)
    
    (elixir 1.16.0) lib/string.ex:2736: String.to_atom/1
    (elixir 1.16.0) lib/map.ex:916: Map.update!/3
    (ash_postgres 1.3.62) lib/migration_generator/migration_generator.ex:3092: AshPostgres.MigrationGenerator.load_identity/2
    (elixir 1.16.0) lib/enum.ex:1700: Enum."-map/2-lists^map/1-1-"/2
    (elixir 1.16.0) lib/map.ex:916: Map.update!/3
    (ash_postgres 1.3.62) lib/migration_generator/migration_generator.ex:2912: AshPostgres.MigrationGenerator.sanitize_snapshot/1
    (elixir 1.16.0) lib/enum.ex:1700: Enum."-map/2-lists^map/1-1-"/2
    (ash_postgres 1.3.62) lib/migration_generator/migration_generator.ex:85: AshPostgres.MigrationGenerator.get_operations_from_snapshots/3

Seems like the function expects the attribute source field to be a string but in this case is an atom, there are other fields that will have the same problem, maybe I need to run another intermediary function to correctly normalize the snapshot for that function?

zachdaniel · January 4, 2024, 6:18pm

Hmm…Yeah, try json encoding and decoding it first. You probably shouldn’t have to do that, it should be done for you by take_snapshots I think? Either way, I think that might do it.

sezaru · February 2, 2024, 12:33am

Hey @zachdaniel , here is a quick update regarding this issue:

I created a small resource to make troubleshot easier:

defmodule Pacman.Markets.Entity2 do
  @moduledoc false

  use Ash.Resource,
    data_layer: AshPostgres.DataLayer

  attributes do
    uuid_primary_key :id
  end

  postgres do
    table "entities"

    repo Pacman.Repo
  end

  identities do
    identity :unique_id, [:id]
  end
end

When I run the take_snapshot code, I get:

iex(87)> snapshot = Pacman.Markets |> AshPostgres.MigrationGenerator.take_snapshots(Pacman.Repo, [Pacman.Markets.Entity2]) |> Enum.at(0)
%{
  attributes: [
    %{
      default: "fragment(\"uuid_generate_v4()\")",
      size: nil,
      type: :uuid,
      source: :id,
      references: nil,
      primary_key?: true,
      allow_nil?: false,
      generated?: false
    }
  ],
  table: "entities",
  hash: "8123997EA2DF402FDFB3E8145AF8666DCD0894EEF02C235DE06B42D47AF3457A",
  repo: Pacman.Repo,
  schema: nil,
  identities: [
    %{
      name: :unique_id,
      keys: [:id],
      index_name: "entities_unique_id_index",
      all_tenants?: false,
      base_filter: nil
    }
  ],
  base_filter: nil,
  multitenancy: %{global: nil, attribute: nil, strategy: nil},
  check_constraints: [],
  custom_indexes: [],
  custom_statements: [],
  has_create_action: false
}

If I just run the rest of the code, I will get the error shown above, to make it work I needed to do the following changes:

Clean the attributes field array since I don’t need it and it will break;
Replace the repo field value from atom to string;
In the identities field, I changed the name field value from atom to string;
In the identities field, I changed the keys field values from atom to string;

After these changes, my snapshot changed to the following one:

%{
  attributes: [],
  table: "entities",
  hash: "8123997EA2DF402FDFB3E8145AF8666DCD0894EEF02C235DE06B42D47AF3457A",
  repo: "Pacman.Repo", # CHANGED
  schema: nil,
  identities: [
    %{
      name: "unique_id", # CHANGED
      keys: ["id"], # CHANGED
      index_name: "entities_unique_id_index",
      all_tenants?: false,
      base_filter: nil
    }
  ],
  base_filter: nil,
  multitenancy: %{global: nil, attribute: nil, strategy: nil},
  check_constraints: [],
  custom_indexes: [],
  custom_statements: [],
  has_create_action: false
}

After these changes, the rest of the code works great:

iex(90)> snapshot_without_indices = %{snapshot | custom_indexes: [], identities: []}
%{
  attributes: [],
  table: "entities",
  hash: "8123997EA2DF402FDFB3E8145AF8666DCD0894EEF02C235DE06B42D47AF3457A",
  repo: "Pacman.Repo",
  schema: nil,
  identities: [],
  base_filter: nil,
  multitenancy: %{global: nil, attribute: nil, strategy: nil},
  check_constraints: [],
  custom_indexes: [],
  custom_statements: [],
  has_create_action: false
}
iex(91)> 
nil
iex(92)> operations = AshPostgres.MigrationGenerator.get_operations_from_snapshots([snapshot], [snapshot_without_indices])
[
  %AshPostgres.MigrationGenerator.Operation.RemoveUniqueIndex{
    identity: %{
      name: :unique_id,
      keys: [:id],
      index_name: "entities_unique_id_index",
      all_tenants?: false,
      base_filter: nil
    },
    schema: nil,
    table: "entities",
    multitenancy: %{global: nil, attribute: nil, strategy: nil},
    old_multitenancy: %{global: nil, attribute: nil, strategy: nil},
    no_phase: true
  }
]
iex(93)> 
nil
iex(94)> {down, up} = AshPostgres.MigrationGenerator.build_up_and_down(operations)
{"drop_if_exists unique_index(:entities, [:id], name: \"entities_unique_id_index\")\n",
 "create unique_index(:entities, [:id], name: \"entities_unique_id_index\")\n"}

So, the problem is that `AshPostgres.MigrationGenerator expects that some values are string and try to convert it to an atom, but in this case they are already an atom.

An easy solution for that is just to add the following function to AshPostgres.MigrationGenerator:

  defp maybe_to_atom(value) when is_atom(value), do: value
  defp maybe_to_atom(value), do: String.to_atom(value)

And replace all String.to_atom calls in that module to use that function.

After doing that the original snapshot starts working great.

Would you accept an PR with that change?

zachdaniel · February 2, 2024, 9:34am

I’d accept that PR, yeah

sezaru · February 2, 2024, 1:57pm

Here it is: feat: Make MigrationGenerator accept atoms by sezaru · Pull Request #201 · ash-project/ash_postgres · GitHub

One of the test failed, but I don’t think it has anything to do with the PR.

sezaru · February 2, 2024, 8:43pm

Thanks @zachdaniel for merging the PR, now the snapshot code works as intended

I do have another question thought, one of the things I want to also include in my changes is disabling/enabling the primary key constraint.

Basically I want to generate something that will do the following:

-- down
ALTER TABLE temp.entities DROP CONSTRAINT "entities_pkey";

-- up
ALTER TABLE temp.entities ADD PRIMARY KEY (id);

What I tried to do was to find the primary key attribute and change it to false in the second snapshot:

snapshot = Markets |> MigrationGenerator.take_snapshots(Repo, [Entity2]) |> Enum.at(0)

[pk] = snapshot.attributes |> Enum.filter(fn %{primary_key?: pk?} -> pk? end)

snapshot = %{snapshot | attributes: [%{pk | primary_key?: true}]}

snapshot_without_indices = %{snapshot | custom_indexes: [], identities: []}

snapshot_without_indices = %{snapshot_without_indices | attributes: [%{pk | primary_key?: false}]}

operations = MigrationGenerator.get_operations_from_snapshots([snapshot], [snapshot_without_indices])

{down, up} = MigrationGenerator.build_up_and_down(operations)

this will correctly generate the up and down code to drop and recreate the primary key, but it will also add a drop constraint function to both:

# up
drop constraint("entities", "entities_pkey")

alter table(:entities) do
modify :id, :uuid, primary_key: true
end

# down
drop constraint("entities", "entities_pkey")

alter table(:entities) do
modify :id, :uuid, primary_key: false
end

Any idea why that drop constraint("entities", "entities_pkey") shows in both up and down migrations?

zachdaniel · February 3, 2024, 8:08pm

that seems like a bug.

sezaru · February 3, 2024, 11:17pm

Here is a quick code snippet that will reproduce the issue in case you want to confirm if it is a bug or not:

snapshot = %{
  attributes: [
    %{
      default: "fragment(\"uuid_generate_v4()\")",
      size: nil,
      type: :uuid,
      source: :id,
      references: nil,
      primary_key?: true,
      allow_nil?: false,
      generated?: false
    }
  ],
  table: "entities",
  hash: "CA477683CD2E4677DB771BC11752CD755DD54D2238502F106CE6EBE4E6FF834F",
  repo: Project.Repo,
  schema: nil,
  multitenancy: %{global: nil, attribute: nil, strategy: nil},
  base_filter: nil,
  check_constraints: [],
  custom_indexes: [],
  custom_statements: [],
  identities: [],
  has_create_action: false
}

snapshot_without_indices = %{
  attributes: [
    %{
      default: nil,
      size: nil,
      type: :uuid,
      source: :id,
      references: nil,
      primary_key?: false,
      allow_nil?: false,
      generated?: false
    }
  ],
  table: "entities",
  hash: "CA477683CD2E4677DB771BC11752CD755DD54D2238502F106CE6EBE4E6FF834F",
  repo: Project.Repo,
  schema: nil,
  multitenancy: %{global: nil, attribute: nil, strategy: nil},
  base_filter: nil,
  check_constraints: [],
  custom_indexes: [],
  custom_statements: [],
  identities: [],
  has_create_action: false
}

operations = AshPostgres.MigrationGenerator.get_operations_from_snapshots([snapshot], [snapshot_without_indices])

{down, up} = AshPostgres.MigrationGenerator.build_up_and_down(operations)

IO.puts("===============================================")
IO.puts(down)
IO.puts("===============================================")
IO.puts(up)

zachdaniel · February 4, 2024, 3:10pm

I’ve got an idea here. Its a bit complicated, but we do need to drop the primary key in cases where we are adding/removing an attribute to it, but I think the problem is that in the case of removing a primary key we shouldn’t drop it in the up.

zachdaniel · February 4, 2024, 3:15pm

I’ve pushed something to ash_postgres main to address this.

sezaru · February 4, 2024, 8:34pm

Thanks @zachdaniel , now the migration is perfert!

Also, just in case someone wants to do something similar to this, this is the code I’m using to actually run the up and down variables generated from the code:

  import Ecto.Migration

  args = {self(), Repo, Repo.config(), nil, :forward, :down, %{level: :info, sql: true}}

  {:ok, runner} =
    DynamicSupervisor.start_child(Ecto.MigratorSupervisor, {Ecto.Migration.Runner, args})

  Ecto.Migration.Runner.metadata(runner, [])

  # NOTE: This will run the down migration
  Code.eval_string(down, [], __ENV__)
  Ecto.Migration.Runner.flush()

  # NOTE: This will run the up migration
  Code.eval_string(up, [], __ENV__)
  Ecto.Migration.Runner.flush()

  Agent.stop(runner)