What is the purpose of having multiple Repos in Ecto

floatfoo · February 6, 2022, 9:13pm

Hi! I read some docs on Ecto and have a question about Ecto Repos: What`s the idea of having several repos?
In config/config.exs file I found this:

ecto_repos: [App.Repo]

so I can use several Repos in one app. My general guess is that we can use one repo, for example, for users, another for posts, another for groups and etc. Is that the right pattern?

cmo · February 6, 2022, 9:55pm

No. That is generally how you would divide things into tables within a Repo. If would be rather inefficient to have a database for each table and less fun creating relationships between them.

A Repo is a database instance. If you want to talk to multiple databases you use multiple Repos.

sergio · February 6, 2022, 9:57pm

Not really. One use case would be you have a read-only Repo meant for gnarly queries. Your database writes would go to your main repo. That way the users that are inserting data get a nice experience and you can optimize reading the data separately.

MyApp.Repo
MyApp.ReadonlyRepo

csadewa · February 7, 2022, 2:39am

What`s the idea of having several repos?

It can be used to support Modular Monolith approach to software development. The idea being during that developer decompose software system to multiple software component which shared nothing, thus can be independently developed. In this approach having different DB (or at least DB schema in postgres) for each software component (usually is based on Bounded Context from Domain Driven Design) is considered preferable so that each component can develop/use database independently.

alternatively, in a Monolith-to-Microservice migration, there’s a number of transition step (one of them is similar to Modular Monolith) and having multiple repos help in those case

stefanchrobot · February 7, 2022, 8:50am

You might also want to use multiple repos to have multiple DB connection pools.

sbuttgereit · February 7, 2022, 2:20pm

I’m using multiple repos for a few different purposes.

The first is I’m implementing multi-tenancy based on different databases. A single repo can only connect to a single database at a time. So in this case the application server will accept sessions for many tenants and use a tenant specific repo instance to access that tenant’s database. There is also a general purpose management database used in system wide administrative functions, which naturally has it’s own repo, too.

Next up is I’m using some of the database server security features for some coarse-grained controls. The database server itself knowns which security rules to apply based on the database role of the connection, and a repo only connects to the database as a single role, so each database role must also be a different repo in the application server. So for example, processes which serve public facing APIs may not need all the access that, say, an administrative user at the UI interface might need; so a flaw in the application server doesn’t necessarily become a privilege escalation at the database.

Finally, I can do some resource management as well. Since I’m using PostgreSQL, and numbers of connections per server can be a serious limitation, I want to be sure that some application services are less likely to be degraded because other less sensitive services got busy. For example, I want interactive user sessions to be more responsive and fluid than, say, a public facing API layer accessed by other automated systems and that I’m expecting more interactive users than API sessions. I can have two repos in this case each with a dedicated pool of connections appropriate for their usage pattern; if the API usage hits a saturation point, that doesn’t necessarily impact my interactive user sessions because they aren’t competing with with the API for database access.

Anyway, I wouldn’t suggest going off and doing all those things “just because”. I have specific use cases and usage patterns in mind and have taken a fair amount of time to understand what the trade-offs will be for the decisions above (and there are trade-offs). But at least this should give you some ideas as to why multiple repo support could be useful.

[Addendum] The specific question seemed to ask about multiple MyApp.Repo modules and what I described was dynamic repos which doesn’t require different MyApp.Repo modules to do. I did use the multiple module capability earlier: highly privileged database connections needed to be accessed via a different Repo module instance than non-privileged database access connections; the idea was it was clearer to an application server developer what they were doing and a better expression of intention. In the end, I didn’t go down that path because I had better ways to handle it without. Anyway, I wanted to be sure to address the original question a bit better.

AlfredBaudisch · February 7, 2022, 2:27pm

You use different databases for different domains or when you need one database for the main business data and another for logging, for example.

In my current work, we have a database for the main data (companies, users, time tracking, etc) and one for logging GPS coordinates (currently with 490 million rows).

I decided to split into two databases because the GPS coordinates are not business-critical and the backup of the data was taking too much time, so by splitting into two databases, I set up different backup strategies and it’s much easier (and faster) to maintain the main database.

In the Elixir application then there are 2 Ecto.Repo.

hubertlepicki · February 7, 2022, 2:35pm

The usual way I use this is when I have a more complicated database set up. For example, if I have a one master / write database, and then a replica or replicas, I would create two Ecto repos: one for writing/querying the master database, and the other one(s) for the replicas.

There are also cases where it can be useful to separate a database, like for reporting purposes you can have denormalized database for example, or app that integrates with some legacy database etc. etc.

ityonemo · February 8, 2022, 8:04am

Work for an ml company, we have one repo for active data, we have one repo for training data, data are periodically moved from active to training and the ml practitioners only see the training data.