DBConnection Error - connection not available and request was dropped

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2733ms. You can configure how long requests wait in the queue using :queue_target and :queue_interval. See DBConnection.start_link/2 for more information

Where do I edit this queue_interval and queue_target ??? Please help.

In config/<env>.exs (where <env> is dev, test, or prod):

config :my_app, MyApp.Repo,
  adapter: Ecto.Adapters.Postgres,
  pool_size: 10,
  migration_timestamps: [type: :utc_datetime_usec],
  migration_lock: nil,
  queue_target: 5000

Is what we had to do recently due to increased number of errors in production.

9 Likes

Hi,

reviving this ancient topic as the error is the same, the solution would probably also work but doesn’t quite feel right. I’d rather try to understand why this is happening.

We’re also experiencing this error in a production environment. It happens rarely (maybe every 2-3 days), often comes in batches of 1-3, and does not cause any other exceptional output in our logs.

DBConnection.ConnectionError: connection not available and request was dropped from queue after 2290ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. By tracking down slow queries and making sure they are running fast enough
  2. Increasing the pool_size (albeit it increases resource consumption)
  3. Allow requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information

(db_connection 2.2.2) lib/db_connection.ex:745: DBConnection.run/3
(stdlib 3.12.1) gen_server.erl:637: :gen_server.try_dispatch/4
(stdlib 3.12.1) gen_server.erl:711: :gen_server.handle_msg/6
(stdlib 3.12.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Now to the interesting bits:

  • The server does not have any regular traffic (we’re pre-launch), and we’ve seen the error happen at night time as well, so these are quite likely not caused by user requests.
  • My main idea for a culprit: For scheduled (cron) jobs, we have Oban running. These jobs are also likely to checkout a DB connection.
  • Otherwise the application should be more or less asleep, at least judging from our code. We also have plenty of DB connections available, so in theory the pool should not run out of them.
  • I don’t really understand the stacktrace above: :gen_server.try_dispatch/6 is where GenServer sends the message to the callback module, I think. However, DBConnection does not use GenServer and its run/3 is certainly not a GenServer callback. It seems like there are some frames missing in the stacktrace?!

Does anyone have an idea how to debug this better?

[edit] Found this thread with the same error and Oban, but as said we have plenty of DB connections. POOL_SIZE is large.

1 Like

The cron scheduler checks out a connection once a minute. Each running queue will checkout a database connection once per second (unless you’ve configured a different poll invertal). The number of attempted connections will scale linearly with the number of queues, so that increases the potential to have db connection issues.

@sorentwo thanks, that is good to know. However, we only have 4 queues configured and a POOL_SIZE of 15, I think. Plus, I would assume these queries to be extremely quick, especially when there are usually no jobs returned, right? No reason to keep other connection checkouts waiting in Ecto’s queue for ~3s.

Also, sorry for saying that Oban would be the “culprit”. It’s the only “active” (in DB connection terms) component we have, hence could be related. But the 3s wait time in the queue makes me think the underlying issue is some hiccup in PG, Ecto, or DBConnection. It’s just too long to be caused by any “normal” operation we have.

Thanks for your insights.

With only four queues and a POOL_SIZE of 15 you definitely shouldn’t encounter db checkout queueing. The queries take a couple of milliseconds at most, and when there aren’t any jobs they are sub-millisecond.

Don’t sweat it! I’ve seen this issue for years now with and without Oban, so I believe there is something lower level happening that I haven’t figured out.

Excellent. Since you’ve also experienced this issue and can’t tell about its root cause, I’m willing to just let it go and silence our exception tracker. Thanks again!

(still any hints by others are welcome of course :smile:)

what do these parameters here mean, please?

https://hexdocs.pm/ecto_sql/Ecto.Migration.html#module-repo-configuration

2 Likes

Oh btw, also take a look at the types here, especially since we’re mentioning the date/time ones.

https://hexdocs.pm/ecto/Ecto.Schema.html#module-primitive-types

I am actually using utc_datetime_usec in my app to capture when the user clicked the enrol button, and then sort records based on that - the idea is maybe it’s more fair than looking at database row number. plus it looks super cool when you have a list of people in class sorted by time with microsecond precision :slight_smile:

EDIT> sneak peek

1 Like