Why do Ecto migrations not use raw SQL and instead rely on Ecto Schemas?

fireproofsocks · March 14, 2019, 3:45pm

While working with some colleagues who are familiar with other languages and frameworks, they wondered why Ecto migrations don’t use raw SQL and instead typically rely on the Ecto Schemas to insert data. I have never seen any examples of raw SQL in Ecto migrations, but it would make sense to isolate the database “bootstrapping” away from any specific code implementation that might show up in your Schema files.

Any thoughts on this? Thanks!

shanesveller · March 14, 2019, 4:21pm

I would actually assert that using Ecto.Schemas, changeset functions, and/or context modules directly in migrations is dangerous, independent of whether doing pure data insertions in a migration is a good or bad practice - which was part of my position in the forum thread about Seeds as migrations.

My main concern about this practice is because I believe your migrations should remain valid to execute from start-to-finish throughout the life of your project - no shortcuts with mix ecto.dump/mix ecto.load, etc. I also believe that once a migration has been applied anywhere outside of the author’s workstation, you should treat that migration file as largely read-only and not go mucking about with its meaningful content later on. Aesthetic changes to syntax are still okay if the behavior remains the same. This is obviously a subjective position to take, and a fairly militant one.

Now, with that context in mind, here’s how those concerns pertain to this question: If your schemas evolve sufficiently over time, and you’re intentionally using schema structs and/or changeset functions in migration files, you’ll likely eventually experience the problem that the content in your earlier migrations, which were valid at the time, are no longer passing validations or are otherwise failing to execute with the current versions of those schemas/changesets. How do you solve this? Adhering to my constraints above, you nearly have to resort to schemaless Ecto queries or raw SQL.

It’s not often that I get to hold up Ruby/Rails as a positive example for Elixir devs, but they’ve also previously included some notes in their official documentation about the pitfalls of this approach, and techniques to avoid this tight temporal coupling between migrations and ActiveRecord models. It looks like it’s been removed from the latest version of the docs for Rails 5.2, though, but some of the ideas in the historical link are still pertinent.

fireproofsocks · March 14, 2019, 4:25pm

Yes! That was the exactly the concern (and from a Rails developer).

So why don’t the official docs push raw SQL migrations in their examples? I’m not even sure I know HOW to issue raw SQL statements using Ecto…

Currently I use repo operations (e.g. Repo.insert_all()), but I noticed that it can get weird when you use an atom vs. a string for the 1st argument, but regardless, it’s all in an effort to avoid using any Schema modules in our migrations.

Ninigi · March 15, 2019, 7:26am

I would consider using schemas in your migration as antipattern, for the reasons @shanesveller listed.

Just because I am curious, where did you find an example of this?

fireproofsocks · March 15, 2019, 4:01pm

That’s the only example I’ve ever seen in my experience with Elixir to date. In fact, I’m not sure I have ever seen an example of how to run raw SQL.

Examples of using schemas to create seeds:

When you create a new Phoenix project, the example presented in the generated seeds.exs does not use raw SQL, it references a schema:

# Script for populating the database. You can run it as:
#
#     mix run priv/repo/seeds.exs
#
# Inside the script, you can read and write to any of your
# repositories directly:
#
#     Hello.Repo.insert!(%Hello.SomeSchema{})
#
# We recommend using the bang functions (`insert!`, `update!`
# and so on) as they will fail if something goes wrong.

https://phoenixframework.org/blog/seeding-data
Seeding Data in Phoenix - ElixirCasts seems to demonstrate using a context, which I think might be more brittle than using the schemas.

The hex docs at Overview — Phoenix v1.7.10 don’t seem to make mention of seeds, but in any case, I think the predominant example most users will encounter will demonstrate seeding your database with schema structs and not raw SQL.

benwilson512 · March 15, 2019, 4:25pm

Seeding data in this case doesn’t seem related to all of this. I view seeds as populating the database for development in which case structs are fine, that’s the present value. It runs after all the migrations run anyway. Otherwise what’s really even the point of having the file? If it’s just the production seeds you did 3 years ago then it’s just a relic with no present value.

I agree about the overall point of not using schemas inside migrations though.

As for why migration files use a DSL and not raw SQL: convenience functions. It lets you do field :foo_id, references(:foos) which helps set up constraints, makes sure the column types are correct, etc.

Ninigi · March 15, 2019, 5:05pm

Ah, interesting! Following is my take on that, so grain of salt etc…

IMO a seed should be exactly that: The very minimum of data you need when firing up the project, for example if you have a product database, and you have a set of vendors, you’d want new developers to start out with a most basic, but realistic data set.
It should also be a viable seed on newly created “clusters” that do not communicate with other clusters at all, so in the given example every newly created shop could have the same vendors, but not necessarily the same products.

This also means that you should probably have functions that handle creating new vendors (gonna stick to this example), even if it is just for internal use, without ever providing a public UI/API. IMO, your seeds should only invoke functions you could also use in production, to create new vendors, and more or less emulate what a developer would probably do when setting up a fresh app in whatever environment.

Using structs to create database entries should be limited to test factories (again imo, others might disagree) to circumvent potential side effects in the actual create functions.

Would you mind giving some more information on how you were using it that resulted in your post? Maybe we are not talking about the same thing

Edit: You linked the seeds tutorial, and I agree. I would not use that pattern, I would prefer using functions that are tested and production ready. So instead of inserting a vendor, I would write a function register_vendor to handle it. So, if the seeds fail, the function should be rewritten.

fireproofsocks · March 15, 2019, 5:57pm

Sorry, I should clarify: this discussion applies to both migrations and to seeds… the bigger picture here came out of dealing with enums in the database. From my experience with MySQL, I know that those do not migrate well… e.g. if you have to add a new value to the list of allowed types, it was a problem. I don’t know if that’s true with PostGres.

Nonetheless, instead of an enum column, I used a related table and a foreign-key relationship. That’s preferable since it allows us to add a few bits of extra data to the values if needed.

But that got me to the point of wanting to include seed data (and therefore schema-structs per the available examples) for those tables as migrations, because in my mind, they are part of the database definition, and had we defined the columns as enums, their definitions WOULD be in the migrations. So I would up having some seeded data in my migrations – that lead to a separate post, and there are some strong opinions about whether this is an appropriate practice, but in this case, it seems to make a lot of sense and solve the problems of trying to seed data into servers where we may or may not have access or a working command.

anthonator · March 15, 2019, 5:58pm

Here are the docs for executing raw SQL from a migration.

https://hexdocs.pm/ecto_sql/Ecto.Migration.html#execute/1

benwilson512 · March 15, 2019, 8:20pm

Again I think you’re using seeds here in a different sense than the phoenix generator is. If you’re talking about production server bootstrapping then yeah, perhaps those seeds should sit in some kind of migration. That isn’t what seeds.ex is most commonly for. It’s for bootstrapping your development environment with some basic data so you can use the current version of the application. Elixir code makes perfect sense there. Arguably, it should promote using context functions instead of direct inserts however.