Hello, everyone! This should probably be a blog post, but I don’t have a blog, so here we go My thoughts and frustrations regarding domain modeling in the face of persistence, distilled into a few paragraphs. I’m interested in your opinions.
This is probably going to get a bit philosophical, but hopefully also stay practical. I’d like to talk about the age old problem of ORMs. And when it comes to the impedance mismatch problem, Ecto is in the exact same situation as any other ORM in other technologies. For that reason, I’m going to refer to Ecto as an ORM in this post.
The impedance mismatch
The so-called object-relational impedance mismatch is a misnomer (like so many things in computing). The same problem arises without objects, because the mismatch really exists between graphs and relations.
Graphs are incredibly useful for adding meaning to the data, allowing us to work effectively in a given context. On the other hand, the relational representation is removed from any context, which makes it a great choice for storing data. That way, we can easily add new behaviour even if it needs to shape the data very differently.
But this versatility is both a blessing and a curse. It is a Jack of all trades, but a master of none. Therefore, we either accept significantly worse design of the application code, or end up mapping the relational representation into graphs based on our current use case.
A practical example
I’m going to show a simple example to illustrate the problem of designing code in fear of persistence. The idea is inspired by a video about DDD I watched recently.
Let’s say we need to work with books that have an identifier, a title and an edition. We can ignore everything else to focus on the actual issue. An edition can either be ordinal, identified with a number, or seasonal, identified by a season and year.
Design without persistence
In a world ignorant of persistence, where we can use the full power of idiomatic Elixir, the given description could easily translate into something like this:
defmodule Bookstore.Book do
defstruct [:id, :title, :edition]
end
defmodule Bookstore.Edition do
@seasons [:spring, :summer, :autumn, :winter]
def ordinal(n) when is_integer(n), do: {:ordinal, n}
def seasonal(season, year) when season in @seasons and is_integer(year) do
{:seasonal, season, year}
end
end
The code clearly explains what is going on with the two types of editions, each type is unambiguously identified and contains only the relevant data. Great!
Design constrained by persistence
When tasked to implement this requirement in a real world application, the design thought process would likely be very different, though. We would most likely start by creating an Ecto migration, because after all, everything needs to conform to the database.
Since relational databases aren’t known for their excellent support of sum types, the migration would probably end up looking similar to this:
defmodule Bookstore.Repo.Migrations.CreateBooks do
use Ecto.Migration
def change do
create table(:books) do
add :title, :string, null: false
add :edition_type, :integer, null: false
add :edition_number, :integer
add :edition_season, :string
add :edition_year, :integer
end
end
end
Inevitably, there is going to be an Ecto.Schema coresponding to the database table.
The table could hold all kinds of invalid data, so we would of course attempt to contain the mess at the application level. We would use changeset validations and hopefully define an enum for edition_type
and edition_season
.
This would however result in all our application code knowing about these different fields, carefully checking the type and knowing which other fields are relevant based on the type.
Of course, I’m being generous. In many real world applications, there would be no enum for edition_type
, the column probably wouldn’t exist at all. Instead, all the code would check which of the other fields are nil and dispatch logic based on that.
Even worse, it’s possible that somebody smart enough to do this, but not yet wise enough not to do this, would reuse the same column for edition_number
and edition_year
, since they both map to integers.
At this point, there is no easy way to use the data correctly and no intuitive way to understand what the invariants even are, because the code does not contain that information.
The worst of both worlds?
Let’s face it, this design with a bunch of nullable columns is both terrible application design and suboptimal database design. Yet, it is the design I see every day in the projects I work on. I think that’s because the tools we have make it the only easy option.
Improving the database design would involve normalizing the data and splitting it into multiple tables. Just imagine the nightmare of all those JOINs and Ecto.Schema associations we would need in our application. That is clearly not worth the extra complexity.
On the other hand, we could decide that the application design is the only thing that matters and simply serialize the edition as JSON. This would allow us to have the design we wanted with just a custom Ecto.Type. But we would be giving up on so many features of the database.
Custom mapping?
I have to admit that I am now entering a territory that I have not yet explored in a serious project, so there will be some speculation.
Considering the significant impact this trivial requirement has on the application design, it seems that custom mapping may be the best answer to anything beyond basic CRUD. In the case of books and editions, the mapping could look like this:
defmodule Bookstore.Ecto.Mappers.Books do
alias Bookstore.Ecto.Schemas.Book, as: BookSchema
alias Bookstore.Book
alias Bookstore.Edition
def to_domain(%BookSchema{} = data) do
edition =
case data.edition_type do
:ordinal -> Edition.ordinal(data.number)
:seasonal -> Edition.seasonal(data.season, data.year)
end
%Book{id: data.id, title: data.title, edition: edition}
end
def from_domain(%Book{} = book) do
data = %BookSchema{id: book.id, title: book.title}
case book.edition do
{:ordinal, number} ->
%{data | edition_type: :ordinal, edition_number: number}
{:seasonal, season, year} ->
%{data | edition_type: :seasonal, edition_season: season, edition_year: year}
end
end
end
Now we can design the application exactly how we want and choose any storage implementation we decide appropriate. The only thing that will need to change is the mapper. Of course, normalizing into multiple tables would require a larger change of the mapper, but the domain model would still stay the same.
Unfortunately, this approach has downsides as well. We need to write the mapping code on our own, but what’s worse, we lose important features of Ecto. Change tracking is now gone and we will need to perform even more mapping for data that comes from the outside, duplicating many of the fields.
Perhaps most importantly, enforcing these mappers and keeping their design consistent is going to be difficult and require some discipline from everyone involved in the codebase. After all, it is easier to follow design decisions set by a framework.
A possible compromise?
Mapping everything on our own is clearly a difficult task. Moreover, we are throwing away more than we would like. After all, what’s the point of ORMs if we need to do the mapping ourselves anyway? This is their job!
Using Ecto.Schemas as our domain data structures may be a good trade-off. But we need a way to model the domain without conforming everything to the database design. This includes nested data, sum types, mapping multiple columns into one field and probably a way to build one schema from multiple tables. Maybe then Ecto could be “good enough” as an ORM.
It’s entirely possible that this was of thinking leads directly into the trap described in The Vietnam of computer science. I may simply too inexperienced to see that. Maybe it’s not worth it to add all this extra complexity to Ecto. But in that case, custom mapping seems like the only option left.
Conclusion
Just like many other framworks and ORMs, Phoenix and Ecto present a devil’s bargain. As long as we are building a web interface for a database, where one form field maps to one column and the application does not need to do anything complicated with the data, everything is simple. But anything beyond that quickly starts to hurt.
Custom mapping may be a lot of work, but in order for Ecto to be good enough as an ORM, I’m afraid it would need to evolve way beyond what it is now. Assuming that an ORM can actually be good enough.
In the end, if we want to keep all the nice benefits that Ecto provides, the following will always remain true. Phoenix may not be your application, but Ecto is.