Finch unable to provide a connection

Hello, in our Phoenix API, we’re making numerous calls to external APIs like Opena Ai, with long responses that can take 2 to 15 seconds.
As an HTTP client, we use Req {:req, "~> 0.5.2"}, when the number of user requests to our API reaches approximately 20 per second, hence the same number of calls we’re doing to the external API we’re getting the following error: (RuntimeError) Finch was unable to provide a connection within the timeout due to excess queuing for connections. Consider adjusting the pool size, count, timeout, or reducing the rate of requests if it is possible that the downstream service is unable to keep up with the current rate.

I played with Finch opts, in the Application supervision children:

children = [
      {Finch,
             name: MyConfiguredFinch,
             pools: %{
               :default => [
                 size: 1000,
                 count: 200,
                 pool_max_idle_time: 120_000,
                 conn_max_idle_time: 120_000
               ]
             }},
               ....
   ]

but seems it doesn’t help
I would highly appreciate any help since this happening in the production

Can you confirm that the upstream service is healthy and capable of handling your load?

Keep in mind that Finch connections are lazy—they only initiate new connections when needed. If the upstream service is degraded at that time, try to open a new connection may result in a timeout.

may I please clarify by the upstream service you mean the external API to which Req makes requests? If so, yes time to time they can respond with 429 but not that often. from 1200 requests per minute only 600 were handled, for all others I got the mentioned above error.
Is there a way to make connections eager?

Yes! I was referring to the external API that Req makes requests to.

Given the occasional 429 responses, it seems likely the issue is related to the service being unable to establish a new connection when Finch requires a new one (due to lazy initialization).

To confirm, you could try lowering the pool size and pool count (perhaps even setting it to 1) and see if the error persists. If this is indeed the issue, you’ll likely see timeouts, but the connection pool should remain functional and not raise this error.

Thank you very much Gabriel
Is the following setup correct in general? And also pull size 1, does it slow down the performance?

Seems correct! But it for sure slow down the performance and the way it is right now it would apply for all services called by this MyConfiguredFinch instance.

So, I would suggest 2 things:

  • Create a configuration for the specific service you are having trouble with.
  • Do not decrease it to 1 on the first try, but measure and see

Somehting like this:

children = [
      {Finch,
             name: MyConfiguredFinch,
             pools: %{
               :default => # Keep as it is
               "https://your-faulty-service.com" => [size: 10, count: 1]
             }},
1 Like

It helped but not totally, less but still getting the same error, is there buy chance a way to catch this particular error? I can see only RuntimeError without any additional markers

You could probaly match on it’s message on a catch/2

My suggestion now would be increasing :pool_timeout option on Finch request. See Finch — Finch v0.19.0

Since some of your requests take a long time (eg. 15 seconds) you may end up starving the pool.

One way to measure it is to start pool metrics on finch with the option :start_pool_metrics? on Finch start_link. See Finch — Finch v0.19.0

And then run a periodically job that will execute Finch.pool_status/2 and gather this metrics so you can check if your pool is starving or not.