Have Ecto wait for DB to spin up?

We’re considering using AWS Aurora Serverless PostgresSQL for our developer environment. I’ve already determined that it will meet our needs as far as Postgres compatibility, but I’m looking for help getting over the last inconvenience. Whenever the DB is idle for a certain amount of time, it spins down completely. It spins itself back up when the next connection attempt is made, but it takes ~20-30 seconds. What this means is that when I start my Elixir/Ecto/Postgrex application, I get a whole lot of DBConnection errors. I can then wait 20 seconds and restart and everything works fine.

One other thing that’s probably relevant is that the port is open and the cluster is listening even when it’s spun all the way down. That is, I can do the following:

  • See cluster capacity at 0
  • Run nc -v DB_HOST 5432 on my client, and immediately get Ncat: Connected to IP_ADDR:5432 in return
  • See cluster capacity spinning up to 2 just from that port connection

So I’m guessing that Postgrex isn’t failing to connect; it’s just failing to get a response in a timeframe that makes DBConnection happy.

I’ve set timeout, connect_timeout, and handshake_timeout to 60 seconds, but I’m still getting errors from GenServers and other processes that start up and immediately try to query the Repo, e.g.,

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2918ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

I have no idea what appropriate queue_target and queue_interval settings would be for this particular case. I’d also love to be able to display some kind of “Waiting for database to spin up…” message.

Any thoughts on this would be greatly appreciated.

1 Like

Reading the docs, I got the idea that the queue_ options are for preventing overload, but found a interesting one: after_connect_timeout

The maximum time allowed to perform function specified by :after_connect option (default: 15_000 ).

There is also a configure option, that you can use to print messages and reconfigure before any attempt to connect:

A function to run before every connect attempt to dynamically configure the options, either a 1-arity fun, {module, function, args} with options prepended to args or nil where only returned options are passed to connect callback (default: nil )

Here is the full docs on the options for DBConnect: DBConnection — db_connection v2.4.2

3 Likes

Ah, so if after_connect blocks, I could use it to wait for the connection to actually return something, with an after_connect_timeout of 60 seconds or so. Worth a try. Thank you!

1 Like

@mbklein I’ve run into similar challenges. How did you solve this problem?