Amazon RDS load balancing with Ecto & DBConnection pool

jvoegele · October 18, 2019, 2:48pm

We have an application hosted on AWS Fargate that uses AWS RDS for a managed database service. We have configured the RDS to use four reader replicas (in addition to the one writer) and we have two Fargate tasks with 40 DB connections apiece that query the readers. Our expectation is that Ecto queries and DB connections would be fairly evenly spread out across the four readers.

However, what we have found is that RDS does not evenly spread out the connections across the readers, but concentrates them on only one or two of the reader replicas. Usually we will see 40 connections on two of the readers and no connections on the other two, and there are times when we will see all 80 of our DB connections on a single reader while all three of the other readers sit idle.

After some discussion with AWS support we’ve found out that the load balancing algorithm for RDS does not play nicely with the DBConnection pool. Specifically, we were told that if multiple connections occur at or around the same time, then all of those connections can be sent to the same reader endpoint. And, since DBConnection acquires all of its connections immediately upon startup, this means that all of those connections will be sent to the same reader endpoint when the Fargate task first starts up.

It seems that, in general, the RDS load balancing algorithm assumes that a new connection will be opened up for each query, rather than having persistent connections all opened up at the same time. Given that we have no control over the load balancing algorithm, I’m wondering if there’s anything we can do in our application code or in the configuration of the DBConnection pool to alleviate this problem?

Is there any way, for instance, to configure DBConnection to introduce a small delay (maybe ~1s) between opening its connections at startup? Or, even better, a way to have DBConnection open its connections lazily, such that new connections are opened only when there is a request for a connection but all existing connections are currently checked out? Or would this require a custom connection pool implementation?

Any other ideas about how to get around this problem?

Thanks in advance for any help you can provide!

jvoegele · October 18, 2019, 6:34pm

Another way to get around this problem would be if the DBConnection pool were to reconnect after a configurable idle period. Or even allow application code to explicitly close connections and have them reconnect automatically. I don’t believe either of these is currently possible with DBConnection, though, right?

tracstarr · March 31, 2021, 2:43pm

Did you ever figure anything out with this issue? I’ve recently just come to the same conclusion and trying to figure out the best way to approach it.