Prepopulating DB deployment questions

Hi I’ve been using a combination of guides to deploy mainly this one.

I need to prepopulate my database.


question 1: How do I do this?

Can I just use seed and do this.

Where I set the environment variable in my server like so?

MIX_ENV=prod mix run priv/repo/seeds.exs

The problem is I’m not entirely sure where is the seeds file path is on my server. I got a releases folder and 0.1.0.


Question 2: Every time I deploy, does it nuke and erase all data from the production?

If the answer is yes how do people work around this in term of work flow? Backing up the existing production data? Deploy, run the backup production data? It seems tedious.


My current solution is I’ve been ETL web scraped data into an SQL script for each table. I just scp the script onto production server. From the server I run the sql insert script on to the production database.

It’s pretty tedious and am looking for alternatives.

Thank you for your time.

Hi.

I’ve found a solution that may work with distillery so I’mma try it soon, as for the second question I can just deploy and see if it nuke the production data.

Thanks again

Haha, would be interesting design choices if any deployment destroyed prod data on redeploy :).

I think running seeds to initialize a data set is okish for the first run, but your deployment mechanism would obviously not want to run this each time or you with likely end up with duplicates (unless you code again this).

Generally what your asking though is quite a common problem, how to deal with data updates is afaik not something you get for free with ecto or any other ormy thing. Migrations here are more schema related and for good reason.

The problem is it’s not a trivial thing to do right, as there isn’t always an obvious rollback path. You may insert a load of records then decide you need to rollback after other things have happened in prod causing data to change.

As such you never really have rollbacks, just incremental data changes. And unlike migrations they are really only valid on the current dataset.

In some cases the changes may be deemed sevear enough to warrant bringing part of a system offline to apply them. Other times you can make small changes without doing such things, or potentially make changes incrementally if you have clustered data stores.

Basically there is no silver bullet here, but you want to be looking into automation deployment tools as much as possible to at least aid with this.

If your logging on to prod servers to run adhoc scripts then you really need to rethink this once your scale or data becomes more important.

1 Like