How to get the status of a Ecto.Repo pool?

itopiz · January 20, 2021, 7:07am

Hi,

I am developing an API that fetches data from multiple remote databases. For each of these databases, an Ecto.Repo is configured at runtime when the application starts.

When I am trying to fetch data from a disconnected database (network failure, database server not started, …), I get this error after a timeout:

** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2657ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Tracking down slow queries and making sure they are running fast enough
  2. Increasing the pool_size (albeit it increases resource consumption)
  3. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information

Since a pool of connection is managed in the background, when I am attempting to query a disconnected database, would it be possible to get an immediate error indicating that no connection is available rather than having to wait around 2657ms?

Also, I would like my API to offer the possibility to get the status (online or disconnected) of each database without having to wait for a 2657ms timeout. Any idea if Ecto.Repo allows that?

dimitarvp · January 20, 2021, 12:58pm

You might be misreading the error message. It says you are sending too much connection requests to the pool of the particular Repo and the pool is thus timing out.

Can you check how many connections you configure per Repo?

itopiz · January 20, 2021, 1:49pm

The message seems to indicate otherwise but it is exactly that error that I get while trying to query a repo that don’t have any connection.

I left the default so there are supposed to be 10 connections per pool/repo but there are not any connection when the database is unreachable.

For the purpose of my tests, I configure a repo that points to an non-existing (MS SQL) database server and I see the following messages repeating indefinitely in the console:

[error] Tds.Protocol (#PID<0.362.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.364.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.360.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.363.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.365.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.367.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.359.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.358.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.361.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused
[error] Tds.Protocol (#PID<0.366.0>) failed to connect: ** (Tds.Error) tcp connect: econnrefused

The pool is trying to recreate the connections on a regular basis because anything can happen with the network and that is great.

I would like to be able to get the status of a pool, or perhaps the number of active connections.

dimitarvp · January 20, 2021, 2:06pm

econnrefused should mean “connection is refused” which tells you that the MS SQL server isn’t there and listening on the host/port you are expecting.

What’s your exact Repo configuration? At the very least, check that you can connect with Microsoft’s tools on the host/port pair.

itopiz · January 20, 2021, 2:42pm

You are right.

My config for my simulated unreachable database server was using localhost as host. I changed the config to target a real database server which I turned off and here are the errors (which also repeat indefinitely in the console):

[error] Tds.Protocol (#PID<0.638.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.630.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.636.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.632.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.629.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.631.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.635.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.633.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.634.0>) failed to connect: ** (Tds.Error) tcp connect: timeout
[error] Tds.Protocol (#PID<0.637.0>) failed to connect: ** (Tds.Error) tcp connect: timeout

dimitarvp · January 20, 2021, 2:55pm

That’s slightly different but not much. It can mean that your database might be behind a firewall. Is it in your local network?

itopiz · January 20, 2021, 3:18pm

Yes the “unreachable” database server is in the local network, behind a firewall. But it is unreachable on purpose. My goal is to write a test that simulate an unreachable database server.

My test already works but it takes around 2657ms to return an answer. I would like a way to return an immediate response indicating that the database is not available when there are no connection in the pool, rather than trying anyway and waiting for it to timeout.

dimitarvp · January 20, 2021, 3:19pm

I’d think Ecto.Repo is not designed with this in mind but what is stopping you from giving it a much smaller timeout, say 50-100ms? That should be enough to have tests failing rather fast.

I seem to have misread your requirement for a “deliberately failing but you want it to fail fast” test. Apologies.

Configuration options docs:

Especially the latter docs are pretty good. You should be able to set up your Ecto.Repos in a way that serves your requirements.

itopiz · January 21, 2021, 9:28am

Changing :queue_interval in DBConnection to 100ms (instead of the default 1000ms) seems to do the trick. Thank you!

However, I must admit that I do not fully understand the explanation about :queue_target and :queue_interval in the documentation.
I have 10 (failing) connections in my pool, queue_target: 50 and queue_interval: 100. I cannot come up with a theoretical time it would take for a request to be dropped from the queue.
Tests show results varying between 111ms and 202ms which is more acceptable than 2657ms in my use case.

VincentBlackdot · May 26, 2021, 10:06am

hey i am receiving this error and i’ve out looked most possible wrongs and cant seem to find anything.
what does this error mean?

[error] Tds.Protocol (#PID<0.362.0>) failed to connect: ** (Tds.Error) tcp connect: closed

dimitarvp · May 26, 2021, 10:08am

Not sure that’s related to the same thing. Are you using myxql?

VincentBlackdot · May 26, 2021, 10:09am

i am using mssql.

dimitarvp · May 26, 2021, 10:20am

Ah. I don’t think this thread is the right one for you. Plus I never used MSSQL with Ecto so can’t really help.