How do I run occasional tasks that update data in a database?

kepik919 · July 14, 2021, 7:43am

Ooccasionaly I need to run tasks that update data in a database. I may need to run them ever again, or not a new server. That is, I need to run them once and in a certain release only. And they should be outside of git index.

Some tutorials, for instance a one at dashbit.co, suggest that I run them with “custom migration”, in which a 2nd directory for migrations is created called “custom_migrations” and they’ll be run from there via Ecto.Migrator. But this will case a problem: I run all of the custom_migrations, then delete all of migration files (because I won’t need them anywhere else, not on a new server either, once I’ve run them), then create new ones when a need arises, and then Ecto.Migrator will complain about absense of the migrations that I’ve deleted.

I’m also aware of ./bin/my_app eval MyApp.Tasks.custom_task1 but it’s not convinient because I’ll have to call it manually and passing arguments to a function isn’t convinient via the command line.

What I want is: create a several files that I want to be run in this current release, once. Store them in a certain directory of an application. Deploy a application. They’ll get run automatically, probably on application boot and then I remove them. Then, after some time, I may want to create new ones and only those new ones will need to get run.

How to do this? What’s a recommended way in Ellixir/Phoenix?

dimitarvp · July 14, 2021, 8:07am

One good and reliable way to achieve that is to have a deployment script that could enqueue background jobs into your job queue (if you use one).

stefanchrobot · July 14, 2021, 9:11am

We’ve followed that approach for our data migrations and it works good so far. Our only issue is that our data migrations need to run the application code. Since migrations are scripts, they rot over time without you knowing, so we need to remove the migration files once in a while.

Here’s the partial output of mix ecto.migrations:

$ mix ecto.migrations --migrations-path=priv/repo/data_migrations

Repo: App.Repo

  Status    Migration ID    Migration Name
--------------------------------------------------
  up        20210511064356  ** FILE NOT FOUND **
  up        20210512072827  ** FILE NOT FOUND **
  up        20210513063626  ** FILE NOT FOUND **
  up        20210514093522  ** FILE NOT FOUND **
  up        20210518091202  ** FILE NOT FOUND **
  up        20210519094816  ** FILE NOT FOUND **
  up        20210519095658  cleanup_user_references
  up        20210519162228  ** FILE NOT FOUND **
  up        20210519162229  ** FILE NOT FOUND **
  up        20210531140549  ** FILE NOT FOUND **
  up        20210604160944  backfill_contexts
  up        20210616082543  ** FILE NOT FOUND **
  up        20210629000000  ** FILE NOT FOUND **
  up        20210629000001  ** FILE NOT FOUND **

We don’t see any errors during migrations. What’s the error that you’re getting?

LostKobrakai · July 14, 2021, 10:13am

To me data migration is part of the job of migrations, therefore I don’t use a separate mechanism for it unless outside constraints require it (e.g. time to completion being rather long).

Depending on the complexity one can get away with not needing application level code quite well on ecto. A bunch of ecto query or repo api can work with table names instead of schemas modules, so one can have a convenient api and still not depend on application level code, which can change over time.

kepik919 · July 15, 2021, 6:12am

(1) Technically no errors, but there’re “FILE NOT FOUND” ones. How will you know whether or not it’s really a migration file that’s missing for some reason and which is to be fixed, or is it one of the ones with tasks that you’ve deleted intentionally and which is not to be fixed?

(2) mix ecto.migrations --migrations-path=priv/repo/data_migrations

How does one run this command on a server on a release? For mix doesn’t included in a release

kepik919 · July 15, 2021, 6:14am

What do you mean?

What I mean is that, I may need to run such database data updates only once and only on a certain server and only in a certain release. Therefore, they can’t be migrations, because each particular data update may never get run ever again elsewhere.

stefanchrobot · July 15, 2021, 6:34am

What is the concern here? Why would the file get missing other than intentional deletion? Do you really even care that much given that you want to run it only once?

If you really need the extra safety, the deleted version is in VCS. You can move it to a different folder too (e.g. “completed_data_migrations”).

Using the approach from the Phoenix guide on releases:

def migrate_data do
    load_app()

    for repo <- repos() do
      path = Ecto.Migrator.migrations_path(repo, "data_migrations")
      {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, path, :up, all: true))
    end
  end

kepik919 · July 15, 2021, 1:12pm

What are we talking about? A migration? Then not once - on a new server it’ll be run again.

stefanchrobot · July 15, 2021, 8:50pm

I don’t think I understand your use case. You want to run some code to migrate the data in the database once per new server? Do these servers share the same database?