I want to set up preview environments (ad-hoc server + database for pending pull requests).
The main issue we will have to tackle is database seeding: I want it to be minimal to prevent huge bills and lengthy deployments
I guess the best option would be a script to create a purged, anonymized seed from our production database. Is there any tooling to make it easier? (dealing with foreign keys ā¦)
I personally like avoiding production DB imitation and creating data from scratch, but I have never executed on this ideal.
Prod data keeps growing, which will make preview DB creation slower over time, unless you trim some data as well as anonimize. Maintaining a data set that is similar to production will keep you sharp about production data usage patterns.
On the flip side, prod imitation gives you better testing ground for schema and data migrations.
Hereās another approach you may want to consider: writing fake data functions (which I also use in tests anyway) which are simple wrappers around functions in contexts, eg:
def fake_user!(opts) do
custom_username = opts[:username]
attrs = opts
|> Map.put_new_lazy(:email, &Faker.Internet.email/0)
|> Map.put_new_lazy(:username, &Faker.Internet.user_name/0)
with {:ok, user} <- Users.create(attrs) do
user
else
{:error, %Ecto.Changeset{}} when is_binary(custom_username) ->
Users.by_username!(custom_username)
end
end
And then I use phil_columns | Hex to write data migrations calling those functions.
Thereās a few reasons why I like scripting explicit test seed data for your use case (and similar ones):
Itās minimal, so therefore both financially cheap to host (e.g. in a minimal/small sized DB server/instance), and quick to seed.
The seed data can be as friendly as you want. You can give your test objects meaningful, and distinguishable, names. You can create example objects for specific scenarios, e.g. for common issues. You can create example objects for specific bugs, or features, or for testing āplumbingā changes.
Because youāre relying on this seed data, there are obvious points during development and maintenance where it makes sense to add or update (or even delete) seed data, and this is much easier in these kinds of scripts (that you wrote yourself) than a āDB backupā script (IME).
Iāve often run into the problem of not having good test/seed/example data when fixing bugs, adding features, etc⦠If that data wasnāt already in some purged/anonymized seed data ā and findable (or knowable) in it ā then Iād either have to do without, manually mangle some to fit my needs (and then lose all of my changes when/if I reseeded my (local) dev/test DBs), or write seed scripts anyways.
Iāve found it to be very helpful to know ā off the top of my head ā exactly where/how to find good example data in my test/dev/preview DBs. I know the user info and the key example objects and they have ābig dumbā names like āHappy Pattersonā (for a āhappy pathā customer). (For local DBs, all the user passwords are āpasswordā.)
Another very helpful aspect of scripting seed data is that you can leverage your existing app/site/system code. Not only does that test that code, but updating that code is automatically āincorporatedā into the seed data (when you reseed a DB). If you change your ādata schemasā, you donāt also need to (āmanuallyā) modify your seed data scripts to match (or even forget to do this).
Scripts also work nice if your app/site/system needs to interface with test/demo environments of any third party services. Iām not sure how youād handle that with a āDB backupā data script ā probably another script.
Ironically, I started writing seed scripts because I didnāt think I had enough time to work on purging and anonymizing a copy of the production DB.
I agree and I think that we will eventually evolve using our own custom seeding code, based on our existing application code.
But we have quite a large codebase/domain model, and writing those seeds from scratch will be laborious! So I think for now we will stick with an automated subset of your production database until we write our seeding code.
I will probably publish a blog post on the topic. I found out that deploying pull requests on ad-hoc environments was way more complicated than what PAAS vendors are telling us
Funny enough, I really only got started writing custom seed code for my own (big) Elixir project after I couldnāt restore a backup of my coworkerās own āsubset of prodā!
I found that only the first ābig sliceā of custom seed data was particularly time-consuming (and even then only took a few hours over 1-2 days maybe). While adding that first slice, I added helpers (and refactored a few things), and since then adding new seed data has been much easier.
I would definitely suggest just starting with enough seed data for your current projects (i.e. issues/tickets/stories/whatever) ā not writing āall the seeds you (think youāll) needā all at once.
I will probably publish a blog post on the topic. I found out that deploying pull requests on ad-hoc environments was way more complicated than what PAAS vendors are telling us
GitLabās version of this uses Docker and K8s and, so Iām guessing, assumes you can deploy an arbitrary environment already.
Note that you can access your appās code, so you can, e.g. call āregularā functions in your appās modules or its dependencies or extract code or data into new modules:
Thank you @kenny-evitt. How do you handle wanting to generate a lot of Something? or do you not often find yourself needing larger samples to operate on?
Pragmatically, Iāve mostly been slowly accumulating seed data for specific purposes, so Iāve mostly been adding small ābespokeā samples.
Generating larger samples is, in general, arbitrarily hard (or can be), e.g. āmimicking productionā.
But again, in practice, itās been easy enough to write a function like MyApp.Test.AThing.random_thing/x if you do need lots of seed items. Iāve found lots of shared uses between code like that thatās useful for tests and generating seed data. This has mostly been helpful for, e.g. testing UI lists or bounds/filters of complex DB queries, and not (much at all) for āsimulating productionā.