Significant amount of Phoenix connections are closed on high concurrency

There is a piece of Elixir software that can generate some network load https://github.com/mpugach/wdbomber-eli/tree/0.1.2

It utilise Flow to dispatch many concurrent requests

We use it to load test or Phoenix API

The specifics is that 10-300 seconds! response is normal, but we need to handle a lot of them at the same time

For some reason about 300 of 500 requests almost immediately return with {:error, %HTTPoison.Error{id: nil, reason: :closed}} with none log entries on Phoenix side in development environment

The Ruby production server (which we want to replace) fails only for 10 responses

here is the endpoint config

  http: [
    port: System.get_env("PORT") || 4000,
    protocol_options: [idle_timeout: 1_300_000, inactivity_timeout: 1_300_000, max_keepalive: 5_000_000],
    timeout: 1_300_000,
    transport_options: [num_acceptors: 10_000, max_connections: :infinity]
  ]

so, what do we miss?

HTTPoison uses connection pool, I think the above error is due to your pool size, once all connections with the pool are use, the above error is inevitable.

Thank you @kodepett for the hint

I specify the pool size here https://github.com/mpugach/wdbomber-eli/blob/0.1.5/lib/wdbomber/client.ex#L14

so it should be the same as the number of concurrent workers

still, some amount of requests fail with the same error

➜  wdbomber-eli git:(master) ./wdbomber http://localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     294
➜  wdbomber-eli git:(master) ./wdbomber http://localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     220
➜  wdbomber-eli git:(master) ./wdbomber http://@localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     214
➜  wdbomber-eli git:(master) ./wdbomber http://@localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
      26
➜  wdbomber-eli git:(master) ./wdbomber http://@localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     199

How about disabling connection pooling - open a connection on the fly; I understand the performance gain when using a pool(tcp handshake etc) but to help us isolate the problem. Let’s see if the error will be cleared then we can focus on the pool or a different area. My two cents sir.

Thank you, it fixed the issue

We have to look into the connection pool issue and see what can do differently to avoid the error, I’ve seen a lot discussions on the connection pool. If you are able to look into it, let me know. Thanks a lot.