DB Connection Pooling – Connecting on demand instead of holding connections open

mbklein · March 29, 2023, 2:43pm

I’m using an AWS Aurora PostgreSQL Serverless v2 database cluster with Ecto. It can auto-scale between 0.5 and 16 ACUs (Aurora Capacity Units), and is priced per ACU-hour. I’m currently refactoring my GenServers to avoid touching the database unless absolutely necessary (e.g., switching all periodic tasks that use the DB to things that are event-triggered).

Once that’s done, I’ll still have one problem – persistent connections. Even when completely idle, maintaining open connections to the cluster prevents it from scaling down as far as it could otherwise. For example, in my tests, shutting the app down entirely allows the DB to scale down to its minimum 0.5 ACU. But just starting the Repo process, even with no database activity, causes it to scale up to 4 ACU and stay there.

What I’d like is some kind of connection pool that looks to the consumer like a normal pool, but where each “connection” doesn’t actually open a socket until it needs to, and releases it after a given idle period. I found this thread from 7 years ago that seemed to be the beginnings of something like that, but it doesn’t seem to have gone anywhere and my other searches have come up empty.

Is this:

A solved problem? (If so, where can I find info on that?)
A terrible idea? (I get why persistent connections are desirable, but my need for auto-scaling outweighs my need to minimize connection latency.)
Something else?

With serverless/auto-scaling/pay-as-you-go infrastructure becoming more and more prevalent, I think something like this (either the implementation of it, if it doesn’t exist, or some easy-to-find documentation of it, if it does) would be a big benefit to people targeting cloud deployment.

cjbottaro · March 30, 2023, 3:30pm

How apropos. I started that thread way back then, but gave up on that idea because we stopped using a “database per tenant” and moved to using a “schema per tenant” and that way we were able to reduce the number of databases down to 6, thus making the consistent connection problem a non-issue.

It’s apropos because we’re about to move to Aurora, specifically for the auto scaling of replicas. Very interesting about connections causing it to scale up. I would have thought it would be triggered off of CPU usage. And disappointing to hear that holding connections causes it to not scale down.

Also, I was under the impression that you’re not supposed to connected to the db or replicas directly, but instead go through a proxy (some kind of modified PgBouncer type thing) which allows holding a persistent connection, but routes (transaction) requests to an active replica.

cjbottaro · March 30, 2023, 3:39pm

Ahh, never mind about the proxy stuff. We’re going to use Aurora RDS, I didn’t catch that you’re using Aurora Serverless.

But hmm, when you think about… PgBouncer is essentially a backend connection cache. The frontend connections (to your app) are persistent. But the backend connections (to your database) are disconnected after a configurable idle timeout. Just make sure it’s in “transaction mode” and not “session mode”!

mbklein · March 30, 2023, 9:31pm

Aurora Serverless v2 actually does support proxies, so maybe that’s the way to go. I’ll also look into PgBouncer. It’ll be great not to have to figure it out on the DbConnection side.

Thanks!

metabren · February 11, 2024, 11:27pm

Hey @mbklein did you figure this one out? We’re also evaluating our options and Aurora Serverless v2 looks attractive, if it can work as one would expect (scaling down & up when necessary).

mbklein · March 25, 2024, 2:48am

Not really – we ended up with Aurora Serverless v2, which doesn’t support scaling down to 0, so maintaining the connection isn’t an issue. But running pgbouncer in the same container task as the elixir app would probably do the trick.

metabren · April 16, 2024, 11:33pm

Thanks. We ended up trying a couple of scenarios: Aurora Serverless v2 w/ RDS Proxy then without. In the end there was little need for RDS Proxy. Furthermore, in the end our DB activity is a little more spiky than previous thought (which ends up in more ACU usage than anticipated), so we’ll be switching to a more standard Aurora setup soon.