Design discussion - Ecto is your application

Sorc96 · November 15, 2024, 12:23pm

Is the schema doing anything at this point, though? Would there be any difference with a regular struct?

LostKobrakai · November 15, 2024, 12:31pm

Depends on what you expect it to do. I still declares relationships e.g. to preload them, it still does all the ecto type stuff for converting between various representations of the types values, it still comes with all the reflection around schemas, …

katafrakt · November 15, 2024, 7:57pm

Thanks for this post. It touches on a lot of things that have been on my mind for last few years, working on larger Phoenix applications in larger teams. Quite often it felt like following Ecto “defaults” is limiting and leads to database leaking everywhere. At the same time the community does not seem to have an alternative proposal.

I completely agree that “designing with persistence” is problematic. My perhaps favourite example is that we had a database table representing a Thing. Part of the data in this Thing was updated quite seldom in a long-running transactions, another part was updated frequently and pretty much atomically (think: views count on an article). The problem: when the long transaction was running, it locked the row and those “quick updates” kept piling up, ultimately timing out, flooding our error reporting and leading to data loss. The solution was to split the table into two tables - now slow transaction locked only part of data.

I’m sharing this example because I think it’s pretty convincing one for the fact, that storage has its quirks and ways of work that enforces some decisions there that make no sense from domain logic level. There two tables were modeling a single Thing, always queried together. But of course there were two Ecto schemas and preloading everywhere…

Custom mapping

I admit that in one project after long discussions we tried this approach. But it did not work well. Especially at the beginning it was seen as just an unnecessary boilerplate. And even worse, we decided to have Schemas.Order and Structs.Order, so every time you saw Order in the code, you had to check which one is aliased.

There was probably a lot of poor judgement and bad design involved, but this turned out to not really be beneficial.

So what else?

Two things come to mind as alternative approaches:

CQRS, not necessarily paired with event sourcing, to separate read modeling from write modeling. Unfortunately that it a hard sell for many teams, because the concept was paired with other concepts too often and people think of it as a package with Kafka, eventual consistency, poor transactionality etc.
Embrace repositories. Let schemas be just a thin wrapper over database tables. But all meaningful data reading and writing should go through repository modules. This leaves schemas as purely application layer. Repositories sit on the border between application and domain layer, accepting and emitting domain structs, internally working with Ecto schemas. Unfortunately you have to probably give up on Ecto changesets for that or limit their usage only for type coercing, which again makes it a hard sell. Validation in that case should probably be done with something like Drops or with schemaless changesets.

The second approach is also a bit difficult to explain to people because of Ecto.Repo module. I’ve heard sometimes that “we already have a repository, what are you talking about?”. And on top of that, repositories are not so easy to design and I haven’t found a lot of good resources about that.

Anyway, many thanks for starting this discussion here. A lot of great replies already. Perhaps if we have more of this kind of conversation, we’ll come up with some ideas.

Sorc96 · November 15, 2024, 8:29pm

Thinking about this now, you’re right. Schemas still keep most of their functionality. So I guess the only issue is that a schema like this does not quite feel like a first class citizen. Instead of Repo.all(Schema), the code would now need to look like Repo.all(Schema.base_query()), which may not actually be a real problem. As already mentioned, the insert, update and probably delete story would be more complicated anyway.

Thanks for your replies!

Sorc96 · November 15, 2024, 8:35pm

I have very similar experience. Mostly with Ruby on Rails, where Active Record really ends up dictating the design of everything, but I have noticed similar issues with Elixir and Ecto. Other frameworks in other languages seem to have the same problem.

I don’t have experience with CQRS, but I’m certainly interested in trying it to see how I like the result. If you know of a good learning resource for it (without the extra parts you mentioned), I’d love to take a look.

As for repositories, I think those would fit the custom mapping proposal, unless I’m mistaken? I don’t even necessarily mind the boilerplate (although convincing others may not be easy), but my main worry is the fact that the design of the repositories is arbitrary. They will undoubtedly end up designed inconsistently and change throughout the project’s lifetime.

So I’m still hopeful for better mapping capabilities as part of Ecto, which would allow us to keep the nice features like changesets and avoid custom mapping, while offering good enough freedom for domain design.

LostKobrakai · November 15, 2024, 8:48pm

Yeah, it really isn’t. Schema in that place is just one of a few Ecto.Queryable implementations. Under the hood that’s basically turned into a query as well. If you want to be really sneaky you could even make __schema__/1 overridable and change the implemenation for __schema__(:query).

katafrakt · November 15, 2024, 9:50pm

Yes and no. In your example and in the attempt I made before the mapping was still 1:1 between Ecto schema and a “domain struct”. While you can roll up few columns into nicer structure (as in edition in your example), you are still limited by your database design.

In my understanding repositories lift this limitation a bit, allowing to build domain structs from multiple schemas. Let’s say you have multilingual e-commerce.

defmodule App.Catalog.Repository do
  def get_product(id, language) do
    case Repo.get(ProductSchema, id) do
      nil ->
         nil

      product ->
        description = Repo.get_by(ProductDescriptionSchema, product_id: id, language: language)
        price = Repo.one(from p in ProductPriceSchema, where: p.product_id == ^id and p.valid_since < DateTime.now(), order_by: [desc: p.valid_since]))
        %App.Catalog.Product{name: product.name, price: price.value, description: description.body)
    end
  end
end

And in the App.Orders context you would probably query for the price valid at the moment when the order was placed instead of current price + you probably don’t care about the description at all. The App.Orders.Product struct will look a bit different.

I don’t, unfortunately, especially not for Elixir (maybe something for .NET, where I think it’s most popular). Maybe I’ll draft a quick example myself in the upcoming days, if time allows.

joaoevangelista · November 15, 2024, 11:05pm

I’m getting back into Elixir, and after some time in the OO world, I’m shooting myself in the foot trying to model data, instead we should model functions. Pretty data won’t solve any problems. That being said, ORM is a tempting path to follow that will bite you when you try to do things differently, either by lack of support from the database or the ORM. They are good when the domain can be mapped into the database, the schema can be rendered, changed, and then saved without worrying about SQL and everything becomes easy.

About Ecto, the bad thing is that it is a multi-faceted library, the early split of the SQL module helped. You don’t need to use the SQL module and Repo functions if your database can’t be mapped in a way that makes sense. Ecto can sit at the edges of your application, validating incoming data, both from the web and from the database (yes the database, where there is always a null field that shouldn’t be and it wreaks the application). You can write your queries and give them to Ecto to execute/sanitize them. That is especially useful when dealing with a legacy database or a database first design, you get to load only the data you need to that given case, and you don’t subject yourself to mapping 30 joins on the schema. You can use Repo.load/2 to fill your schema for that particular query result (but I don’t think it will validate the data). You can wrap it all with a ~~DAO~~ Data Access Module, which is a repository from DDD but you don’t get into the argument “we already have a Repo”.

TLDR: Functions over Data. ORM sucks, write SQL

Sorc96 · November 16, 2024, 3:39pm

I’m a big fan of overriding the default query, although doing it this way feels way too hacky and unwieldy. I would love this becoming a callback on a potential Ecto.Schema behaviour. Then the documentation could show that overriding the query is possible and even expected in non-trivial situations.

You have shown that it is easy to use shcemas with custom queries already, but I think this could be the small step that makes the feature really feel intentional. Especially for refactoring the database design, the application code would only need to change in this one place.

lithium · November 17, 2024, 4:16am

I kind of disagree here, some JOINs don’t bother me that much. If they do, you can hide them behind a database view (which still makes inserts slightly more annoying and breaks the “code first” strategy somewhat) and/or have your database mapping layer return a neat denormalized list of maps/tuples/your-preference (like you did later in your post) and the rest of your application can happily ignore the relational details. In my experience unless your domain is changing constantly (which I have experienced, total nightmare) the data mapping/whatever layer won’t need to change that much if you’ve done a good job modelling your problem in the database.

100% agree with you there. That table design would get your pull request (gently) rejected at my job.

That is painful. I agree that it’s easier to just spit out some flat denormalized tables but it’s very rarely worth it. I don’t think it’s that hard to do proper normalization once you’ve done it a few times, even if you give up some ORM magic. Especially with LLMs around to type out boilerplate for you. I genuinely think you’d be better off having tables with an id and json column and treating them as document stores rather than a halfway designed relational schema that will contain invalid data anyway. There’s also Ecto embedded schemas if you want to have sub-entities in your code but keep everything denormalized into a single table in the database. IIRC you can add constraints that test json values in postgres too, but I don’t remember exactly how much they can do.

This is kind of why I don’t use them much on my personal projects. Or rather, I don’t use many of their complex/magical features. I view application domain objects and the database as FUNDAMENTALLY different things that will likely need manual mapping. The application domain objects exist to be used in application code, freely passed around, transformed, and deconstructed as needed. The relational database primarily (it facilitates efficient querying too of course, but so do NoSQL databases) exists to provide a repository where only correct, consistent data can live. You may be able to generate a Book struct without an ID, but if you try to send that data to the database it should reject it. It should also reject any entities that try to reference a book with an ID that doesn’t exist, for example, so it does more than just validate the shape of a single struct. I don’t think there can ever be a great automagical mapper between domain objects and relational table for the same reason you can’t blindly cast(not the right word but whatever) JSON from an HTTP request to domain objects: They are distinct entities that model the underlying reality in fundamentally different ways so that we can make different trade-offs.

I think Ecto is already a pretty good ORM. Granted, I’ve only really worked with .NET EF Core professionally, and don’t use many of its bells and whistles, but so far I have not missed anything while usng Ecto. I really love how you can insert SQL queries/query fragments everyhere too. My only complaint so far is that it doesn’t have out of the box 1st class support from some postgres features that I like, like uuidv7s or (IIRC) window functions, but that’s very minor and I understand why the team chose to make Ecto generic rather than a dedicated postgres API. Also obligatory shout-out to the Ecto developers for doing wonderful work with only a fraction of the resources that entities like Microsoft have.

Overall I do not think there is a clean, easy solution to the graph/relational mismatch (I like your rephrasing a lot, I will steal it if you don’t mind). Although sometimes I wonder what we could achieve by embedding postgres or SQLite directly into the runtime like Mnesia…

And please forgive me for anything I’ve said that is myopic or ignorant: I’ve written more SQL than anything else at this point (so I’m biased towards it) and I’ve only been a developer for 5ish years. I am also a C#/SQL Server developer at my dayjob and still learning the Elixir ecosystem.

lithium · November 17, 2024, 4:20am

This is what I heavily prefer. The database is not part of your application, it is an external system that you’re probably talking to over HTTP. Letting the database leak into your application will inevitably cause problems as it grows.*

*For those rare systems that don’t change much after the first-pass implementation, like a hacky proof of concept that miraculously manages to stay out of production, it doesn’t matter as much.

Sorc96 · November 17, 2024, 8:32am

Thank you for adding another well thought-out contribution to this topic, I appreciate it!

I see you’re approaching this from the custom mapping side, and of course, that makes most of the other problems disappear. I haven’t really seen an Elixir project that does this, though, so there does not seem to be any guidance on how to do it well.

Absolutely. Don’t get me wrong, the database part of Ecto is really amazing and probably better than what I have seen anywhere else. The validation part of Ecto is pretty good as well. I’m mostly talking about the mapping capabilities that don’t sem to be “good enough”, but of course, that’s not a problem if you use custom repositories to do the mapping.

I’d be happy about that!

sodapopcan · November 17, 2024, 7:49pm

Whoa ok, wrong time to take a couple of days off, lol.

To address this, you’re right and I’m only speaking in my narrow experience. The active record pattern (the general one, not Rails’ specifically) pushes this by its very nature. Ecto, of course, doesn’t do that. I haven’t brought it up because a lot of the conversation is geared toward many tables to one schema, but of course another thing I don’t see enough of (though certainly many people do it) is having schemas only be small slices of tables. One thing that happens a lot when doing table-first design is you end up with all these join tables, always to users but also to other entities, that are actually first-class concepts but that isn’t always apparent you think of it as a join table. Often Accounts.User is referenced everywhere when often it would be better suited to have its own context-specific representation. If it’s 1-1 to User then the users table can still be used for storage, but that means you can ignore the account-specific columns.

Anyway, there is an overwhelming amount of stuff here and on the forum in general so I’m going to leave it at that for now.

jkwchui · November 18, 2024, 3:00pm

Since OP started by describing this as a graph-relational impedance mismatch: Attempt at adapting KuzuDB's Rust crate into Elixir via NIF, any tips/thoughts?

I find graph databases (neo4J, dgraph etc) painful to setup/deploy. KuzuDB is a new embedded graph database (think what sqlite is to postgresQL), now with a fledgling Elixir binding. It does ask for a strongly typed schema, but maybe there is some new approach to modeling applications that somewhat breaks free from “ecto is your application”.

dimitarvp · November 18, 2024, 3:22pm

This pulls it ahead from others IMO.

bgoosman · November 18, 2024, 3:25pm

Hi, author of kuzu_nif here. KuzuDB is really exciting for graph. It even has an in browser version (kuzu-wasm)

Sorc96 · November 21, 2024, 11:54am

I have been thinking about this for the past few days. I don’t have real world experience with graph databases, so take this with a grain of salt.

As long as we directly use entities from the database access tool in our domain, they will dictate the design of the domain. I’m afraid this can only truly be avoided by doing our own mapping to domain entities.

Then the question is how constrained are we in designing the domain due to the limitations of the data access library/framework? I’m assuming that a graph database could store the data in a representation that is closer to what we would like in our code, therefore requiring simpler mapping features from the tool that we use to access that data.

However, mapping to a relational database is still clearly possible, it may just need more ergonomic ways to use some of the advanced features of Ecto. In the end, I think this is about convenience. Writing custom mappers with arbitrary design is very different from having a set way of defining mapping using existing tooling.

billylanchantin · November 21, 2024, 7:54pm

I also feel the pain of the “impedance mismatch” between core app logic and the persistence layer. I find this is particularly true with DB-specific concepts like transactions. The semantics of interacting with the DB forces you to consider things which are difficult to express natively in the app. That in turn increases the cognitive load of determining what effects local code changes have on global properties like overall throughput.

I don’t have much to add to this discussion in terms of what to do about it in Elixir. I agree with a lot of what’s been said, and I’ve more or less resigned myself to the idea that this is a fundamental problem in data-intensive applications. Ergonomics can always be improved, and Ecto does a good job IMO, but the friction will never entirely disappear.

I will add however that this pain caused me to pay attention to the Rama project over at Red Planet Labs:

How we reduced the cost of building Twitter at Twitter-scale by 100x – Blog

Their pitch is to overcome this problem by writing applications in a system where there’s less distinction between app and persistence. There is, frankly, a ton of info to digest over there. The best tl;dr I can provide is that their platform is a combo of event-sourced data storage and materialized views on the data, but where the views are defined/consumed by the app itself using native data-structures.

Unfortunately I don’t think I’ll ever use Rama itself. It’s proprietary and also requires the app be written in Java or Clojure. But I find the ideas appealing and I’m amenable to (my understanding of) the underlying sentiment: a more integrated approach – one where the app “speaks” persistence natively – may be necessary to overcome the mismatch.

D4no0 · November 21, 2024, 9:10pm

Honestly transactions feel like a lot like pipelines that are built with with, in terms of their properties to short-circuit and rollback.

My opinion on this is that this friction will never truly disappear, you can look at higher-level ORMs from languages like c#/java where they tried to abstract the data storage as much as they could and it’s clear that their approach is much less flexible than ecto, not to mention that it’s more complex overall than having the low-level options, like queries.

I think that ecto key quality is extensibility. If you are not a stranger to how metaprogramming works, you can literally tailor the ecto to your project needs, saw this being applied successfully in a lot of good projects. The same goes for coupling, if you don’t want to couple too much to ecto, you can literally do schemaless queries or use ecto for validations only at the edge of the system.

I also think that adding more features to ecto is a mistake, as a lot of issues encountered with ecto are from project to project, hence an abstraction for that specific project is much easier and manageable to do.