DDD, Ecto and Umbrellas

Hisako1337 · March 6, 2019, 9:16am

Hey there, I have a question about umbrella structuring. The situation is as follows:

I have a “core” domain, that is a shared kernel for other downstream domains
I have several subdomains, where different teams might work independently later on
I want to use a single postgresql database for the whole umbrella app
Every team will need to manage some database models on their own, in addition to upstream models like the core domain’s

Basically it boils down to the situation that I have several phoenix frontend apps, several ecto backend apps and some pure-logic apps that might interact with external services but to not need a phoenix frontend or ecto persistance.

Deployment must be atomic, hence an umbrella project, what we call the “microlith”.

So far so good, everything worked out good, until I came to the point where more than one app needs distinct domain entities persisted in the same database. After a few days of googling, it seems there are these options:

have only one app within the umbrella hold the ecto Repo, migrations, schemas, and contexts. On the plus side, phoenix generators can be used easily, but it defeats the point of separating subdomains, since developers then constantly have to work in their own domain app AND the “db” app
have only one app within the umbrella hold the ecto Repo, migrations, schemas, but not contexts. Here the phoenix generator can not be used to generate schema/migration/context at once, because the contexts live in the correct domain apps. Still the issue of app-switching for developers
have only one app hold the ecto Repo, but nothing else, migrations/schemas/contexts live in their domain apps. I have no idea how to configure things to get this running.
have every app using ecto on it’s own, configured to use the same database. In addition to being source of bugs (repeated database credential handling), this would spawn an ecto connection pool for every domain app to the very same database. Feels like something one would want to avoid entirely.

Does anyone have other ideas/experiences? If not, are there some examples how I could achieve 3) cleanly? The phoenix book demonstrates good how to split an app into a ecto/web umbrella, but not how to handle the more complex case of n-ecto/n-web umbrella projects.

Qqwy · March 6, 2019, 9:55am

I do think that (4) is the best option, because it ensures the least amount of coupling between sub-domains. I’m fairly certain that the Ecto connection-pool size can be configured somehow.
Also, these four repos or their configuration could definitely fetch their configuration from the same location, mitigating the ‘repeated database credential handling’-issue.

kelvinst · March 6, 2019, 10:34am

Well, I don’t remember which topic was it, but I remember a similar discussion about having 2 or more apps pointing to the same db. But I remember a post on the topic I try to stick to: don’t use the same database on multiple apps.

The main point of having different apps is to define clear separation of concerns and fault tolerance, and saving everything to the same db does tie the apps back together.

So that said, I would just merge all the apps that use the same database, or, if it really makes sense to separate them, create different dbs for each one.

One might say that some of your options would have only one app using the db, like options 1, 2 and 3. But actually not, all the apps would still share the same db at the end of the day.

axelson · March 6, 2019, 10:54am

I don’t think much configuration is really needed for #3. We use a somewhat similar setup, with one app that holds the repo, some schemas, and migrations. I think you’ll need to have all of the migrations in a single app because they need to be run in order. If you go multi database you also need to know exactly what you’re giving up, mainly atomic transactions in the umbrella along with foreign key constraints.

LostKobrakai · March 6, 2019, 11:44am

To me there are two sides of the coin. Similar to the “to spawn or not to spawn” discussion I feel like it should be differentiated between how to handle code (modules, migration setup, schemas) and how to handle runtime concerns (db connections, pooling, “repo module”, migration on an actual db, …).

To me the first part should be separated for sure, while the later part should be configurable, as there are reasons for running on a single db with a single set of connections just as well as going full ham on separating down to the db layer. Sure it might not be perfect and there might be collisions in the database, but that’s a tradeoff inherent to using a single database. And if things really are to be split up into multiple nodes / databases it’s not hindered by the written code, but it’s just a matter of reconfiguring and moving stuff around.

So I’d say 3.

As to how to do 3. cleanly. Have a proper boundry by which contexts handle persistance. If that’s present switching out the actual persistance implementation should not be a big problem anymore, which eases the question of “one repo module”, “some repo modules” or “many repo modules”. It’s more a matter of assigning available persistance options to the contexts.

You can see this if you ask people about how hex packages should handle integration with ecto. From what I’ve seen people seem to suggest (and I concur): Don’t provide actual migration files, but provide their setup through some module/functions, don’t start your own repo, but make it configurable and let users decide how to actually integrate into the bigger picture. There are some exceptions, like e.g. eventstore, which has a valid reason so start own connections because of additional guarantees needed, but for most packages I’d say this is the way to go.

kelvinst · March 6, 2019, 11:44am

True. That’s why my first suggestion was to merge the apps.

stefanluptak · March 6, 2019, 12:00pm

We somehow iterated into option 2, but we’re lucky to have different contexts inside each Phoenix app yet. We’ll probably think about moving to option 1 if the duplication/overlap arises. Currently, this is quite good.
One thing I noticed is that it’s quite simple to move things around (even in a medium-sized app), so I propose to try the most favorite option and eventually move to the less favorite one.
Also, IMO it’s possible to have a mix of 1 and 2, which could potentially be practical too.

Qqwy · March 6, 2019, 12:10pm

I guess that if you need atomicity between parts of your umbrella, then that might be an indication that those parts are better off living together as a single part rather than being split.

Of course, it all depends on the actual case at hand.

Very true.

abitdodgy · March 6, 2019, 12:23pm

The issue with having multiple databases is that many larger apps that can or should be broken up might still share the same database. A retail ERP might separate apps to handle orders, customers, and stock, but yet they share a single database. You can create separate databases and handle communication internally, you would give up a lot of stuff like referential integrity, associations, and so on.

kelvinst · March 6, 2019, 12:33pm

Good argument.

I understand your feeling, I have the same one sometimes, but there is one thing: the modules and the way you organize them totally define how your runtime will work. There is no way to totally isolate these two things as they interact with each other deeply, and even more when you separate them in different apps.

In my current company, we have several apps that use dbs on our umbrella project, some of them share the same db, some don’t. As @axelson mentioned, not sharing the db brings up some problems, but sharing it brings up others (like the one the author is having). The solutions mentioned might bring other problems. And all of that why? Because you want to separate your modules in different apps, but keep the advantages of having the tables tied together like one single app?

My true question is: why separating modules in different apps if at runtime they still coupled? Organization, one might say. Well, I do prefer an organized runtime than an organized code, so if you are planning to do it incrementally, it would make more sense to decouple runtime first. I can’t think in a way of organizing my runtime without a big refactor on the code though, and that’s my point: they are coupled to each other, you touch one, you’re going to be touching the other one way or another.

kelvinst · March 6, 2019, 12:35pm

Agree, that’s why I think the solution most of the times is to have one app and keep the code organized in there. Also, I’m sure there is a lot of code on an ERP not related to business logic that could be extracted to other apps to make the business logic app leaner.

Hisako1337 · March 6, 2019, 1:00pm

Maybe I can build up an imaginary scenario that is not an ERP system thingy:

the core domain shall be “cars”. entities have properties like technical specifications, assembly parts, and so on. basically what the company does at it’s heart: building cars. A dev team will maintain the integrity of this core domain and datasets.
the company has a marketing departement, sponsoring an other team to build some custom CMS/B2C web app. It has it’s own (distinct) entities like WYSWIG pages, newsletter statistics, but also wants to use the exact data from the core domain, for example to build teaser/product pages automated from the core date, enriched with content from their own entities
the company has a finance department, sponsoring an other team to build some controlling dashboards. Again, there are distinct entities to the team but also a shared kernel with to core entities (keeping sales transactions for cars, parts, … supplier management for parts, …)
to assist the C-level suite, an business-intelligence team is formed, which wants to work with combined data like parts -> from_supplier-X -> user_feedback_in_marketing_for_car_using_part_from_supplier-X . Basically ad-hoc querying across domains, when the individual teams change their data structures there is no problem aside from fiddling with some queries when needed again. Only the core domain must keep its integrity for comparison over time, … reasons.
customers suddenly do want a native mobile app from the company, and because reasons the external app-building-team wants some GraphQL API to access parts of the data from core and marketing. After the app, managers recognize that the API can be distributed to other partners, leading the way to some kind of B2B API system, so a dedicated team is founded to work on it.

Yes, one could cut these things down into distinct services, sharing nothing but a interface for communication, but in my case I really do want to do transactions across teams in some cases. Also, there are management reasons to have a monorepo and a drastic shortage on DevOps people. Also, following DDD there is the concept of “persistence ignorance”, so after I modelled the domains and their (overlapping) entities, splitting things up nicely into umbrella-apps, this is the last stumbling stone to get things running.

Does this scenario help?

peerreynders · March 6, 2019, 4:03pm

In my view that scenario describes several distinct systems.

The one that sticks out is the BI system. Effective business intelligence usually requires an entirely different database schema like a star schema. Having any heavy analytic or reporting activity against the core business OLTP database is similar to the reach in reporting anti-pattern for microservices.

I have a “core” domain, that is a shared kernel for other downstream domains

One has to be careful with identifying shared/common versus related.

what we call the “microlith”

Microservices are overhyped but the idea behind them that deserves all the attention is bounded contexts which are all about appropriately managing cohesion and coupling. Bounded contexts can be used inside monoliths - they just require a lot more discipline to maintain because of the lack of inherent separation that you have with microservices.

And when you are using service style bounded contexts you will have apparent duplication. In the article both Sales and Support have their own internal version of the Customer and Product entities. While they share the same name the shape of the data will be different as either needs entirely different details about the entities.

There is a natural tendency towards harmonizing data types but that can be a recipe for disaster for large systems due to tight coupling by unnecessarily exposing data consumers to details that are only relevant to others:

Other musings about how systems level DRY can lead to tight coupling between systems.

Integration Database:
As a result most software architects that I respect take the view that integration databases should be avoided.

Your Coffee Shop Doesn’t Use Two-Phase Commit

Compensating Actions are more work but most forms of decoupling are (which is why it should only be applied where it pays off).

All that being said you don’t want to overcomplicate your approach with unnecessary complexity from the very beginning if you only have simple needs right now and future needs are uncertain (as they tend to be). But there needs to be constant vigilance in monitoring technical debt to catch just the right time to factor capabilities into their own application that could potentially run on a separate node.

Don’t drink the kool aid

While umbrella projects provide a degree of separation between applications, those applications are not fully decoupled, as they are assumed to share the same configuration and the same dependencies.

So while umbrella projects provide a means for managing service style bounded contexts there is really no real mechanism for encouraging loose coupling - it’s all up to developer discipline.

Poncho projects (parallel applications) encourage decoupling a bit more (which is why Dave Thomas prefers them) though they still can’t guarantee it.

One possible way to segregate the data for different bounded contexts within a single database might be by using schemas (@schema_prefix ). If any DML references another bounded context’s schema it signals that it is fishing in the wrong pond.

Hisako1337 · March 6, 2019, 5:04pm

Thank you for the interesting links, will read!

Of course dedicated systems are the straightforward Solution for the scenario. But let‘s assume the scope of the whole system is just right for being an umbrella project. And the target vision ist to have ultra lightweight approaches to everything… so no full blown enterprise BI team, but just a bunch of people experimenting with SQL once a week.

kelvinst · March 6, 2019, 8:17pm

Yeah, I know that depending on the scenario, an umbrella can get quite appealing, and I myself would maybe use umbrellas in a way that’s not my preferred way. I’m just saying that my preferred way is not to start creating different apps pointing to the same database from day one. Because I know that appealing stuff are not the best stuff out there sometimes. And trust me, that comes from someone that saw both worlds, a really messy monolith, and a really messy micro-servicy thing. I prefer the messy monolith.

Maybe the problems you are having because there is too much stuff in the same app are a lot bigger than the ones you would have by separating it, so you basically have to choose what you will give up. The good thing in Elixir is that you can start monolith-first and then extract stuff to other apps as the thing goes by.

About your example, there are stuff on your example that would REALLY make no sense to stick to only one database. The first two for example, why would you need the cars and marketing stuff to be in the same db? The only thing that marketing will ever do to that data is reading, no transaction needed. The same is the relation of the cars and finance parts. Well, actually, I see no advantage in having the data of all these services all saved in the same db at all.

Well, the thing is: I consider using multiple umbrella apps to cut into distinct services. But without the “HTTP request only” bullshit. cough never do that cough cough

I really can’t see why would you need transactions across all of them. And for these cases, there is always ways to keep consistency between distinct databases and services.

You would still be able to use umbrellas and get the same monorepo with multi-db.

I know, and I agree with this, but there is nothing keeping you from doing it to contexts inside the same app. I know separating the app is a way to make it explicit the things are different, and that’s good, but then again, I don’t see why would you just want to keep the same db. If separation of concerns and “persistence ignorance” is so important, why the consistency of the data between 2 “persistently ignorant” things is something so important?