Database separation on umbrella

ntd23 · July 18, 2022, 10:04pm

In the current project wich I’m working, is an umbrella project that is been maintened by around ~4/5 years. This project it was thought in that each umbrella app will have own Repo, ie, your own database (this is not so true, some apps do not have a Repo). But now we have about 30 apps and growing. The first things that already broken are the tests, testing entire umbrella app at once almost aways overflows the database connections. Moreover we have data dependencies between apps, sometimes we have to intersect two tables and it is a pain to get the data, the project has a bunch of queues but is not a event driven architecture this queues is more to do background jobs smoth.

I have some points:

This separeation of database (each app with own Repo) in a umbrella project can be done correctly, if yes how?
How to deal with splited data in a umbrella project like this in case that have’nt a strong event architecture and old structured data already in database.
In case to move the spreaded Repos to one app that will be the database “guardian”, how to deal with this one point of failure that will became the database?
And at last, spread Repo along all the apps is a good pratices to keep the bounded context like microservices?

dimitarvp · August 6, 2022, 12:51am

Strong opinions ahead.

Don’t aim for microservices, almost ever. You’re not Google and the odds of the app needing separate databases are extremely small. Whoever did this made you a dis-service and you should work hard to undo it. Use one database and one Ecto.Repo and make wrapping modules that handle all data access (including pulling data from more than one database, at least until you merge everything into one). That should be your very first step.
Tests overwhelming the DB should be easy to fix since they run locally. You can easily give the test repo 100 connections to use. Or you can remove async: true from test files. Both can work.
It’s a good idea to only have one app / namespace that deals with data if your app(s) are so many and there are different responsibilities to account for. Ideally all apps calling the “data guardian app” will not know anything at all about the underlying storage. When an app grows that becomes the sensible choice. I.e. none of the other apps will ever construct Ecto.Query by themselves and will not directly use the Ecto.Repo. They’ll do something like OurApp.Storage.Order.find_pending(limit: 50), for example. And only the OurApp.Storage namespace will use Ecto functions. Nobody else. The other apps should not even have :ecto in their mix.exs file.
Bonus points: convert the Ecto.Schema results returned by Ecto functions to other structs. This is 50/50; architecturally it can save you some headache but I’ve very rarely seen this level of paranoia materialize in actual savings of programmer nerves and time. Use your own judgement here.

Perhaps this is too general but that’s what I can offer right now. If you have any other questions, shoot.

ntd23 · August 6, 2022, 10:32pm

About (3) having the only one “guardian of data”, we have perspectives to deal with a large amount data flow to read and write data and reports, some this data flow perspectives already is a reality. A doubt that I have is, this app do not will become a “single point of failure”? And how deal with that?

Ps: I think was beacause this that many years ago was opted use this approach to spread databases along apps

Thanks, your opinion will be much valuable for me

dimitarvp · August 6, 2022, 10:45pm

It absolutely will and that’s a good thing. You’ll have one place to look at in case of such problems. Spreading data among several databases only makes it harder to fix a problem.

You will not remove all problems right away. But you’ll have an easier time fighting them off.

ntd23 · August 9, 2022, 1:24pm

One thing about this, in case of large app with multiple domains, and some hundreds of models (schemas in this case), not large in scale like google but large sufficient to make a good messy, is not worth split in 2 o 3 data guardians different?

dimitarvp · August 9, 2022, 1:58pm

Sure it can be rational or even preferable but it always Depends™.

If you have e.g. three completely different apps that only occasionally need to work together then separating their databases is the right thing to do.

But if you find yourself regularly having to do cross-database work then the design is not good.

jeremyjh · August 9, 2022, 3:15pm

The typical way to solve this problem is with module structures. Having multiple OTP apps with the exact same dependencies that are always deployed together doesn’t provide any real benefit over just having a hierarchy composed of modules in folders.

dimitarvp · August 9, 2022, 3:32pm

…and using the excellent boundaries library!