Sorc96

Sorc96

Design discussion - Ecto is your application

Hello, everyone! This should probably be a blog post, but I don’t have a blog, so here we go :smile: My thoughts and frustrations regarding domain modeling in the face of persistence, distilled into a few paragraphs. I’m interested in your opinions.

This is probably going to get a bit philosophical, but hopefully also stay practical. I’d like to talk about the age old problem of ORMs. And when it comes to the impedance mismatch problem, Ecto is in the exact same situation as any other ORM in other technologies. For that reason, I’m going to refer to Ecto as an ORM in this post.

The impedance mismatch

The so-called object-relational impedance mismatch is a misnomer (like so many things in computing). The same problem arises without objects, because the mismatch really exists between graphs and relations.

Graphs are incredibly useful for adding meaning to the data, allowing us to work effectively in a given context. On the other hand, the relational representation is removed from any context, which makes it a great choice for storing data. That way, we can easily add new behaviour even if it needs to shape the data very differently.

But this versatility is both a blessing and a curse. It is a Jack of all trades, but a master of none. Therefore, we either accept significantly worse design of the application code, or end up mapping the relational representation into graphs based on our current use case.

A practical example

I’m going to show a simple example to illustrate the problem of designing code in fear of persistence. The idea is inspired by a video about DDD I watched recently.

Let’s say we need to work with books that have an identifier, a title and an edition. We can ignore everything else to focus on the actual issue. An edition can either be ordinal, identified with a number, or seasonal, identified by a season and year.

Design without persistence

In a world ignorant of persistence, where we can use the full power of idiomatic Elixir, the given description could easily translate into something like this:

defmodule Bookstore.Book do
  defstruct [:id, :title, :edition]
end

defmodule Bookstore.Edition do
  @seasons [:spring, :summer, :autumn, :winter]

  def ordinal(n) when is_integer(n), do: {:ordinal, n}

  def seasonal(season, year) when season in @seasons and is_integer(year) do
    {:seasonal, season, year}
  end
end

The code clearly explains what is going on with the two types of editions, each type is unambiguously identified and contains only the relevant data. Great!

Design constrained by persistence

When tasked to implement this requirement in a real world application, the design thought process would likely be very different, though. We would most likely start by creating an Ecto migration, because after all, everything needs to conform to the database.

Since relational databases aren’t known for their excellent support of sum types, the migration would probably end up looking similar to this:

defmodule Bookstore.Repo.Migrations.CreateBooks do
  use Ecto.Migration

  def change do
    create table(:books) do
      add :title, :string, null: false
      add :edition_type, :integer, null: false
      add :edition_number, :integer
      add :edition_season, :string
      add :edition_year, :integer
    end
  end
end

Inevitably, there is going to be an Ecto.Schema coresponding to the database table.

The table could hold all kinds of invalid data, so we would of course attempt to contain the mess at the application level. We would use changeset validations and hopefully define an enum for edition_type and edition_season.

This would however result in all our application code knowing about these different fields, carefully checking the type and knowing which other fields are relevant based on the type.

Of course, I’m being generous. In many real world applications, there would be no enum for edition_type, the column probably wouldn’t exist at all. Instead, all the code would check which of the other fields are nil and dispatch logic based on that.

Even worse, it’s possible that somebody smart enough to do this, but not yet wise enough not to do this, would reuse the same column for edition_number and edition_year, since they both map to integers.

At this point, there is no easy way to use the data correctly and no intuitive way to understand what the invariants even are, because the code does not contain that information.

The worst of both worlds?

Let’s face it, this design with a bunch of nullable columns is both terrible application design and suboptimal database design. Yet, it is the design I see every day in the projects I work on. I think that’s because the tools we have make it the only easy option.

Improving the database design would involve normalizing the data and splitting it into multiple tables. Just imagine the nightmare of all those JOINs and Ecto.Schema associations we would need in our application. That is clearly not worth the extra complexity.

On the other hand, we could decide that the application design is the only thing that matters and simply serialize the edition as JSON. This would allow us to have the design we wanted with just a custom Ecto.Type. But we would be giving up on so many features of the database.

Custom mapping?

I have to admit that I am now entering a territory that I have not yet explored in a serious project, so there will be some speculation.

Considering the significant impact this trivial requirement has on the application design, it seems that custom mapping may be the best answer to anything beyond basic CRUD. In the case of books and editions, the mapping could look like this:

defmodule Bookstore.Ecto.Mappers.Books do
  alias Bookstore.Ecto.Schemas.Book, as: BookSchema
  alias Bookstore.Book
  alias Bookstore.Edition

  def to_domain(%BookSchema{} = data) do
    edition =
      case data.edition_type do
        :ordinal -> Edition.ordinal(data.number)
        :seasonal -> Edition.seasonal(data.season, data.year)
      end

    %Book{id: data.id, title: data.title, edition: edition}
  end

  def from_domain(%Book{} = book) do
    data = %BookSchema{id: book.id, title: book.title}

    case book.edition do
      {:ordinal, number} ->
        %{data | edition_type: :ordinal, edition_number: number}

      {:seasonal, season, year} ->
        %{data | edition_type: :seasonal, edition_season: season, edition_year: year}
    end
  end
end

Now we can design the application exactly how we want and choose any storage implementation we decide appropriate. The only thing that will need to change is the mapper. Of course, normalizing into multiple tables would require a larger change of the mapper, but the domain model would still stay the same.

Unfortunately, this approach has downsides as well. We need to write the mapping code on our own, but what’s worse, we lose important features of Ecto. Change tracking is now gone and we will need to perform even more mapping for data that comes from the outside, duplicating many of the fields.

Perhaps most importantly, enforcing these mappers and keeping their design consistent is going to be difficult and require some discipline from everyone involved in the codebase. After all, it is easier to follow design decisions set by a framework.

A possible compromise?

Mapping everything on our own is clearly a difficult task. Moreover, we are throwing away more than we would like. After all, what’s the point of ORMs if we need to do the mapping ourselves anyway? This is their job!

Using Ecto.Schemas as our domain data structures may be a good trade-off. But we need a way to model the domain without conforming everything to the database design. This includes nested data, sum types, mapping multiple columns into one field and probably a way to build one schema from multiple tables. Maybe then Ecto could be “good enough” as an ORM.

It’s entirely possible that this was of thinking leads directly into the trap described in The Vietnam of computer science. I may simply too inexperienced to see that. Maybe it’s not worth it to add all this extra complexity to Ecto. But in that case, custom mapping seems like the only option left.

Conclusion

Just like many other framworks and ORMs, Phoenix and Ecto present a devil’s bargain. As long as we are building a web interface for a database, where one form field maps to one column and the application does not need to do anything complicated with the data, everything is simple. But anything beyond that quickly starts to hurt.

Custom mapping may be a lot of work, but in order for Ecto to be good enough as an ORM, I’m afraid it would need to evolve way beyond what it is now. Assuming that an ORM can actually be good enough.

In the end, if we want to keep all the nice benefits that Ecto provides, the following will always remain true. Phoenix may not be your application, but Ecto is.

Most Liked

al2o3cr

al2o3cr

IMO this is fundamentally repeating the mapping that Ecto’s already doing to turn the list-of-lists response from the database into BookSchema structs.

It also sounds like you’re looking for a feature like ActiveRecord’s composed_of.

If I was building the system you’re describing, I’d reach for polymorphic_embed and have Bookstore.Edition.Seasonal and Bookstore.Edition.Ordinal embedded schemas. That’s just as straightforward to pattern-match as a tagged tuple, and can participate in things like protocols.

LostKobrakai

LostKobrakai

I think ecto has a hard time to be explained because by being a data mapping tool it is great at two edges of the system – the one where data comes in and the one where data goes out (usually for storage). In a lot of cases, where data is shaped to allow for it, it’s very easy to just not think of those as two edges, but just wire data right through. Therefore that’s a lot of what we see in code shared around elixir. Imo this is however just a lazy mans shortcut, fair to be used where possible, but which needs to be acknowledged as not the way forward where it doesn’t anymore. If there’s a need for richer domain modeling, which deviates from the constraints of storing the data, then that’s imo no longer ectos job, even if ecto might be part of implementing that. That’s one of the places where ecto hard deviates from what ORMs commonly do. Ecto wants to model your data stored in your db by embracing that the data needs to go into a db, instead of finding magic ways to make you do things dbs cannot do well like polymorphic relationships.

If one is fine with the lack of referential integrity of ref_id, ref_type columns then that can be built with ecto. Others might go with abstract tables polymorphism, which retains referential integrity. Or you might not need to have foreign keys, so you opt for polymorphic data in a json column. One might even use all three options depending on their distinct tradeoffs.

katafrakt

katafrakt

Thanks for this post. It touches on a lot of things that have been on my mind for last few years, working on larger Phoenix applications in larger teams. Quite often it felt like following Ecto “defaults” is limiting and leads to database leaking everywhere. At the same time the community does not seem to have an alternative proposal.

I completely agree that “designing with persistence” is problematic. My perhaps favourite example is that we had a database table representing a Thing. Part of the data in this Thing was updated quite seldom in a long-running transactions, another part was updated frequently and pretty much atomically (think: views count on an article). The problem: when the long transaction was running, it locked the row and those “quick updates” kept piling up, ultimately timing out, flooding our error reporting and leading to data loss. The solution was to split the table into two tables - now slow transaction locked only part of data.

I’m sharing this example because I think it’s pretty convincing one for the fact, that storage has its quirks and ways of work that enforces some decisions there that make no sense from domain logic level. There two tables were modeling a single Thing, always queried together. But of course there were two Ecto schemas and preloading everywhere…

Custom mapping

I admit that in one project after long discussions we tried this approach. But it did not work well. Especially at the beginning it was seen as just an unnecessary boilerplate. And even worse, we decided to have Schemas.Order and Structs.Order, so every time you saw Order in the code, you had to check which one is aliased.

There was probably a lot of poor judgement and bad design involved, but this turned out to not really be beneficial.

So what else?

Two things come to mind as alternative approaches:

  1. CQRS, not necessarily paired with event sourcing, to separate read modeling from write modeling. Unfortunately that it a hard sell for many teams, because the concept was paired with other concepts too often and people think of it as a package with Kafka, eventual consistency, poor transactionality etc.

  2. Embrace repositories. Let schemas be just a thin wrapper over database tables. But all meaningful data reading and writing should go through repository modules. This leaves schemas as purely application layer. Repositories sit on the border between application and domain layer, accepting and emitting domain structs, internally working with Ecto schemas. Unfortunately you have to probably give up on Ecto changesets for that or limit their usage only for type coercing, which again makes it a hard sell. Validation in that case should probably be done with something like Drops or with schemaless changesets.

The second approach is also a bit difficult to explain to people because of Ecto.Repo module. I’ve heard sometimes that “we already have a repository, what are you talking about?”. And on top of that, repositories are not so easy to design and I haven’t found a lot of good resources about that.

Anyway, many thanks for starting this discussion here. A lot of great replies already. Perhaps if we have more of this kind of conversation, we’ll come up with some ideas.

Where Next?

Popular in Discussions Top

PragTob
Hello everyone, I know we had quite some threads (read through lots of them) about background job processing but it remains a hotly deba...
New
JakeBecker
TL;DR: I’ve just released an implementation of Microsoft’s IDE-independent Language Server Protocol for Elixir. It adds language support ...
1144 53690 245
New
thojanssens1
It would be nice to be able to define a redirect from one route to another from the router.ex file. E.g.: redirect "/", UserController, ...
New
arcanemachine
https://nitter.net/josevalim/status/1744395345872683471 https://twitter.com/josevalim/status/1744395345872683471
New
AstonJ
Are there any Elixir or Erlang libraries that help with this? I’ve been thinking how streaming services like twitch have exploded recentl...
New
sashaafm
Piggy backing a bit on @dvcrn topic BEAM optimization for functions with static return type?, I’ve been trying to understand in a deeper ...
New
tmbb
This is a post to discuss the new Phoenix LiveView functionality. From Chris’s talk, it appears that they generate all HTML on the serve...
342 18146 126
New
praveenperera
How We Replaced React with Phoenix By: Thought Bot
New
shishini
I think this twitter post and youtube video didn’t get as much attention as I hoped I am still new to Elixir, so can’t really judge ...
New
CharlesO
Erlang :list.nth simple, but 1 - based nth(1, [H|_]) -> H; nth(N, [_|T]) when N > 1 -> nth(N - 1, T). Elixir Enum.at … coo...
New

Other popular topics Top

lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
greenz1
I have a phoenix application from which a user can download multiple(5-6) files of size 1MB. I couldn’t find anything related to sending ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
shahryarjb
Hello, I have map which I want to convert it to string like this: the map: %{last_name: "tavakkoli", name: "shahryar"} the string I ne...
New
gausby
I asked this very same question on twitter and got some interesting feedback, but I thought it would be a good question to ask here as we...
1207 39297 209
New
AstonJ
We’ve put together this wiki for Phoenix LiveView - please feel free to add any info you feel is worth including. What is Phoenix LiveV...
New
klo
Got a question about when to concat vs. prepending items to list then reversing to achieve appending. So i know lists boil down to [1 | ...
New
hariharasudhan94
I would like to know what is the best IDE for elixir development?
New
openscript
Hello! Sorry for this astonishing simple question, but I’m really stuck. I try to set up the intellij-elixir plugin, but I don’t know ho...
New

We're in Beta

About us Mission Statement