How to get the status of a Ecto.Repo pool?

Hi,

I am developing an API that fetches data from multiple remote databases. For each of these databases, an Ecto.Repo is configured at runtime when the application starts.

When I am trying to fetch data from a disconnected database (network failure, database server not started, …), I get this error after a timeout:

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2657ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Tracking down slow queries and making sure they are running fast enough
  2. Increasing the pool_size (albeit it increases resource consumption)
  3. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information

Since a pool of connection is managed in the background, when I am attempting to query a disconnected database, would it be possible to get an immediate error indicating that no connection is available rather than having to wait around 2657ms?

Also, I would like my API to offer the possibility to get the status (online or disconnected) of each database without having to wait for a 2657ms timeout. Any idea if Ecto.Repo allows that?

You might be misreading the error message. It says you are sending too much connection requests to the pool of the particular Repo and the pool is thus timing out.

Can you check how many connections you configure per Repo?

The message seems to indicate otherwise but it is exactly that error that I get while trying to query a repo that don’t have any connection.

I left the default so there are supposed to be 10 connections per pool/repo but there are not any connection when the database is unreachable.

For the purpose of my tests, I configure a repo that points to an non-existing (MS SQL) database server and I see the following messages repeating indefinitely in the console:

[error] Tds.Protocol (#PID<0.362.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.364.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.360.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.363.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.365.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.367.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.359.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.358.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.361.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.366.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused

The pool is trying to recreate the connections on a regular basis because anything can happen with the network and that is great.

I would like to be able to get the status of a pool, or perhaps the number of active connections.

econnrefused should mean “connection is refused” which tells you that the MS SQL server isn’t there and listening on the host/port you are expecting.

What’s your exact Repo configuration? At the very least, check that you can connect with Microsoft’s tools on the host/port pair.

You are right.

My config for my simulated unreachable database server was using localhost as host. I changed the config to target a real database server which I turned off and here are the errors (which also repeat indefinitely in the console):

[error] Tds.Protocol (#PID<0.638.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.630.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.636.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.632.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.629.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.631.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.635.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.633.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.634.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.637.0>) failed to connect: ** (Tds.Error) tcp connect: timeout

That’s slightly different but not much. It can mean that your database might be behind a firewall. Is it in your local network?

Yes the “unreachable” database server is in the local network, behind a firewall. But it is unreachable on purpose. My goal is to write a test that simulate an unreachable database server.

My test already works but it takes around 2657ms to return an answer. I would like a way to return an immediate response indicating that the database is not available when there are no connection in the pool, rather than trying anyway and waiting for it to timeout.

I’d think Ecto.Repo is not designed with this in mind but what is stopping you from giving it a much smaller timeout, say 50-100ms? That should be enough to have tests failing rather fast.

I seem to have misread your requirement for a “deliberately failing but you want it to fail fast” test. Apologies.

Configuration options docs:

Especially the latter docs are pretty good. You should be able to set up your Ecto.Repos in a way that serves your requirements.

2 Likes

Changing :queue_interval in DBConnection to 100ms (instead of the default 1000ms) seems to do the trick. Thank you!

However, I must admit that I do not fully understand the explanation about :queue_target and :queue_interval in the documentation.
I have 10 (failing) connections in my pool, queue_target: 50 and queue_interval: 100. I cannot come up with a theoretical time it would take for a request to be dropped from the queue.
Tests show results varying between 111ms and 202ms which is more acceptable than 2657ms in my use case.