In the current project wich I’m working, is an umbrella project that is been maintened by around ~4/5 years. This project it was thought in that each umbrella app will have own
Repo, ie, your own database (this is not so true, some apps do not have a
Repo). But now we have about 30 apps and growing. The first things that already broken are the tests, testing entire umbrella app at once almost aways overflows the database connections. Moreover we have data dependencies between apps, sometimes we have to intersect two tables and it is a pain to get the data, the project has a bunch of queues but is not a event driven architecture this queues is more to do background jobs smoth.
I have some points:
- This separeation of database (each app with own
Repo) in a umbrella project can be done correctly, if yes how?
- How to deal with splited data in a umbrella project like this in case that have’nt a strong event architecture and old structured data already in database.
- In case to move the spreaded
Repos to one app that will be the database “guardian”, how to deal with this one point of failure that will became the database?
- And at last, spread
Repo along all the apps is a good pratices to keep the bounded context like microservices?
Strong opinions ahead.
- Don’t aim for microservices, almost ever. You’re not Google and the odds of the app needing separate databases are extremely small. Whoever did this made you a dis-service and you should work hard to undo it. Use one database and one
Ecto.Repo and make wrapping modules that handle all data access (including pulling data from more than one database, at least until you merge everything into one). That should be your very first step.
- Tests overwhelming the DB should be easy to fix since they run locally. You can easily give the test repo 100 connections to use. Or you can remove
async: true from test files. Both can work.
- It’s a good idea to only have one app / namespace that deals with data if your app(s) are so many and there are different responsibilities to account for. Ideally all apps calling the “data guardian app” will not know anything at all about the underlying storage. When an app grows that becomes the sensible choice. I.e. none of the other apps will ever construct
Ecto.Query by themselves and will not directly use the
Ecto.Repo. They’ll do something like
OurApp.Storage.Order.find_pending(limit: 50), for example. And only the
OurApp.Storage namespace will use
Ecto functions. Nobody else. The other apps should not even have
:ecto in their
Bonus points: convert the
Ecto.Schema results returned by Ecto functions to other structs. This is 50/50; architecturally it can save you some headache but I’ve very rarely seen this level of paranoia materialize in actual savings of programmer nerves and time. Use your own judgement here.
Perhaps this is too general but that’s what I can offer right now. If you have any other questions, shoot.
About (3) having the only one “guardian of data”, we have perspectives to deal with a large amount data flow to read and write data and reports, some this data flow perspectives already is a reality. A doubt that I have is, this app do not will become a “single point of failure”? And how deal with that?
Ps: I think was beacause this that many years ago was opted use this approach to spread databases along apps
Thanks, your opinion will be much valuable for me
It absolutely will and that’s a good thing. You’ll have one place to look at in case of such problems. Spreading data among several databases only makes it harder to fix a problem.
You will not remove all problems right away. But you’ll have an easier time fighting them off.