In a phoenix app, we will be serving multiple schools, each school has its own DB.
Using the same route, we have to deal with a specific DB depending on for ex: school_id in request.
The question is, what is the recommend way to accomplish this using Ecto? as we are using Repo.insert_all?
Also, when Ecto repo get connected? is it when starting the phoenix server? and it will stay connected then? or it will be connected on each request?
There are two ways of doing what you want. You either need to open N connections to N schools. They will be persisted, each one with individual connection pool. Youâll have to refer to each, by itâs own Repo module, for example Schhol1Repo, School2Repo etc. This blog posts explains in details how to do it:
This may work if you have fixed number of schools.
Generally, if you use PostgreSQL, there is a better solution for multitenancy. PostgreSQL supports schemas, and ecto has a built-in schema prefix support. This could be used with a good helper library: https://github.com/Dania02525/apartmentex
With the second approach, you use single database. But it is logically partitioned using PostgreSQL schemas
However Ecto has no capabilities whatsoever (and actively prevents) joining across schemaâs, so you cannot, for example, have a schema holding all account related information then join that to other parts.
You mean tasks as in mix tasks? In such case - yes. Each one would start itâs own ErlangVM instance, load application and handle own pool - if started from shell. I believe you can work this around by starting mix tasks from within started application, but youâd need mix in either case (and itâs often not available).
If you need to run some tasks on production, periodically, from the same VM instance, sharing the same connection pool as web app, Iâd recommend you have a look at quantum library:
Sorry, I mean tasks in Task.async not in mix, think of multiple async tasks running, each one needs to store a record in DB, would there be a connection for each write to DB? then, how would those connections get closed?
Thereâs a pool of connections. When a process needs to access the database, it checks out a connection, does its business and checks the connection back into the pool once finished.
Connections are not closed, unless an error happens - they are kept alive for as long as possible since establishing a connection is a costly process.
The way you describe - connection pool works in Rails/ActiveRecord. But Ecto is different. It does not keep the connection reserved for particular process throughout the whole life of processes that use it.
Instead, Ecto checks connection out of pool only when it wants to perform operation, and immediately after - checks it into the pool again.
While schemas in PostgreSQL are appealing for multitenancy, itâs not recommended at all because it can cause backups to take a long time if you have a large amount of schemas, tables and data.
The most common use case for using multiple schemas in a database is building a software-as-a-service application wherein each customer has their own schema. While this technique seems compelling, we strongly recommend against it as it has caused numerous cases of operational problems. For instance, even a moderate number of schemas (> 50) can severely impact the performance of Herokuâs database snapshots tool, PG Backups.
While this specifically says thatâs the case for Herokuâs tool, they use pg_dump which also wonât backup large schemas in a timely fashion. So that rules out schemas.
Judging by whatâs been talked about in this thread it looks like the multi-database approach is out too.
So what I recommend is that you look at how you could accomplish this using standard foreign key scoping. If youâre rigourous enough in your testing that scoping is in place where it needs to be I donât see any problem with that approach.
Schemas at my past few jobs have been used to separate âconcernsâ, so there is a limited number, however it is expected to be able to join across them, which Ecto is utterly broken onâŠ
Nah, for actually scaling I use postgresql with a proxy that sends connections out to the different databases behind it.
Ah, I never saw that because our backups are configured to back up each schema individually. Which seems to be solving the problem you describe at the same time.
I donât think that slow backup process should be the definitive reason to declare this approach ânot recommendedâ. I would go with this by default and solve any problems I may have later.
Iâve got couple of dozens accounts, some schemas as big as 3GB. The luck factor may be in my case, however, that those are business users from the same time zone, and we do backups at midnight, while there is very little traffic. Or you simply have way more users/data to back up :).
Weâre also doing it simply because clients want to have these daily back-ups accessible at any time, and to have off-site back up. We do have a layer of backup for our own purposes on different level - archiving WAL entries continuously with wall-e https://github.com/wal-e/wal-e.
This is brilliant since it allows point-in-time recovery of whole database. We had a security incident a few years ago where this actually helped a lot to track down what happened.