Patterns for making seeds idempotent?

Every freshly generated Phoenix project sports a seeds file priv/repo/seeds.exs and an accompanying mix alias ecto.setup to apply those seeds. AFAIU the seeds can only be applied once to a database.

Are there any established patterns to make the seeds idempotent? Is this a good idea at all?

I personally use seeds only for dummy data for development. If it changes I just drop and reapply. Everything else should imo be a (data) migration.

2 Likes

Usually I would just make a simple check if the data exists. You could also set some unique indexes and not insert the data when they are violated.

In general I agree with @LostKobrakai , usually the seeds are either one-time or on dev only. I had a single situation where I would make seeds idempotent for a staging environment where dropping all the data was not possible.

You are totally right: when the seeds change, we should ecto.reset the database.

I am asking because of other considerations: I want to run a database preparation script automatically as part of a devcontainer creation process. But the database may exist already. Rails has a db:prepare command that is idempotent. If there was a mix/ecto/phoenix equivalent of that, I would be happy. See this other forum question about that. But if there is no such equivalent, my seeds should be idempotent, hence my question if there are established patterns for this.

FTR: I came up with this solution to mark the seeds as already applied from within seeds.exs, so that running that script is idempotent.

defmodule SeedsIdempotencyHandling do
  @magic_marker_value 2000_00_00_00_00_00
  import Ecto.Query, only: [from: 2]

  def seeds_already_applied? do
    query = from m in Ecto.Migration.SchemaMigration, where: m.version == @magic_marker_value
    Repo.exists?(query)
  end

  def mark_seeds_as_applied do
    Repo.insert(%Ecto.Migration.SchemaMigration{
      version: @magic_marker_value,
      inserted_at: NaiveDateTime.utc_now() |> NaiveDateTime.truncate(:second)
    })
  end
end

if SeedsIdempotencyHandling.seeds_already_applied?() do
  IO.puts("Seeds already applied, skipping...")
else
  # put your seeds here
  SeedsIdempotencyHandling.mark_seeds_as_applied()
end
1 Like

It’s likely too much machinery for a single seed, but if you have an ongoing need for idempotent seeding there’s phil_columns:

3 Likes

The mixture of snake and kebab case is making me uncomfortable.

1 Like

I used seeds purely for bootstrapping (so run once). Any changes are added as data migrations as already pointed out.

For dev data I make custom mix tasks. For the reset task I truncate all the tables to empty them which is nice if you’re using postgres as you don’t have to close any running connections to reset (iex, psql, phx.server, etc).

The simplest solution (not that I like it) would be to do it as a migration?

That’s the best project name ever.

2 Likes

Whoops, reply to the wrong post.

Hmm, but then you would have to take care that this special migration isn’t applied on prod, wouldn’t you?

Not really, phil_columns-ex.

Oh, wait you only want this for dev? Seed and mix ecto.reset…

Not quite there… I want to script the setup process (see this other post in the thread), and A) not do a full ecto.reset every time as well as B) not have to think about wether I can apply the seeds. Therefore, the setup process (= ecto.setup) needs to be idempotent, and therefore the seeds themselves need to be idempotent (the other steps already are).

Oh god. :laughing: