More than one Repo module for the same database

sezaru · June 26, 2020, 5:13pm

Is it an OK solution to have more than one Repo module pointing to the same database?

Use Case

In my system, data is inserted into the database or read in large quantities via some internal processes that are independent from clients.

This means that a process (or processes) will pool some data from third-party APIs, process it, and insert it into the database.

On the other side, clients (users that access my system via the Phoenix endpoint) can only read these data, but never write.

The issue I’m having is that since these processes can use the database a lot and at the same time, I normally get timeout errors.

The obvious solution is to increase :queue_target and :queue_interval. I can do that and it seems the correct fix since I do expect these processes queries to take a long time, I do not mind if they block other processes for a long time too.

The issue is that they will block users trying to read the database too, since they will use all the available Repo pools.

Possible Solution

So, what I have in mind to fix that is to have 2 Repos for the same database, one read-only that is only used by clients, and other read-write that is used by my internal processes.

That way I think I can make the processes Repo have a very large :queue_target and :queue_interval so they can take their sweet time to finish their work. And, at the same time, I have another Repo for the clients so they will always have available pool connections to use and I can use the default sane :queue_target and :queue_interval values.

Is this a good solution for this problem? Would you tackle it in another way?

Thanks!

hauleth · June 26, 2020, 5:38pm

Create 2 DB instances - leader with RW and hot-replica that will be used for reads. Then connect separate repos to different DBs.

sezaru · June 26, 2020, 7:18pm

Thanks for the suggestion, do you know if creating this second DB instance would have some impact in performance/resource consumption?

shanesveller · June 26, 2020, 7:53pm

Yeap! You didn’t mention if you’re explicitly thinking about a read-replica on the database side or just a logical separation of concerns, but in case the former is the goal or is an option, there is actually a guide in the documentation for this pattern on the Ecto HexDocs here. Many of the suggestions within apply to your scenario either way, IMO.

Maybe and yes respectively - it will be a net-separate connection pool, and the size of this pool can be tuned separately if you don’t need or want 2X the outbound SQL connections.

You can probably anticipate some extra work around testing if you make use of the popular Ecto.Adapters.SQL.Sandbox facilities, which is included by default on some of the generators. There’s a heading in my HexDocs link above about this, but I wanted to call it out specifically because it emerges out of the connection pools being separate. You can’t see insertions performed on one connection inside an uncommitted transaction when querying through another connection entirely.

sezaru · June 28, 2020, 10:59pm

Thanks for the reply.

You didn’t mention if you’re explicitly thinking about a read-replica on the database side or just a logical separation of concerns

For now, it would be just as a logical separation, and because I’m having issues when my processes use all available connections available from the pool for some time blocking user interaction with the database. Hence why I wanted to separate the connection pool into two so I can isolate their use.

Maybe and yes respectively

I asked the performance question since this is a system that I’m doing for my startup, so initial costs with cloud is a main concern.
It seems that creating a second DB instance is a more elegant way, but if it increases my costs it would be something that I would consider to do more in the future when I can invest more money into server hardware.
I will take a good look at it anyway since it looks like a must-have feature to scale a system.

Thanks!

gregvaughn · June 29, 2020, 3:06pm

I’ve done this. In my case it was a set of 4 new tables for a new feature. We wanted logical separation so we could easily split into a separate db instance later. It worked fine. We later chose to create that separate db instance and that step was quite easy.