Significant amount of Phoenix connections are closed on high concurrency

There is a piece of Elixir software that can generate some network load https://github.com/mpugach/wdbomber-eli/tree/0.1.2

It utilise Flow to dispatch many concurrent requests

We use it to load test or Phoenix API

The specifics is that 10-300 seconds! response is normal, but we need to handle a lot of them at the same time

For some reason about 300 of 500 requests almost immediately return with {:error, %HTTPoison.Error{id: nil, reason: :closed}} with none log entries on Phoenix side in development environment

The Ruby production server (which we want to replace) fails only for 10 responses

here is the endpoint config

  http: [
    port: System.get_env("PORT") || 4000,
    protocol_options: [idle_timeout: 1_300_000, inactivity_timeout: 1_300_000, max_keepalive: 5_000_000],
    timeout: 1_300_000,
    transport_options: [num_acceptors: 10_000, max_connections: :infinity]
  ]

so, what do we miss?

HTTPoison uses connection pool, I think the above error is due to your pool size, once all connections with the pool are use, the above error is inevitable.

1 Like

Thank you @kodepett for the hint

I specify the pool size here https://github.com/mpugach/wdbomber-eli/blob/0.1.5/lib/wdbomber/client.ex#L14

so it should be the same as the number of concurrent workers

still, some amount of requests fail with the same error

➜  wdbomber-eli git:(master) ./wdbomber http://localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     294
➜  wdbomber-eli git:(master) ./wdbomber http://localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     220
➜  wdbomber-eli git:(master) ./wdbomber http://@localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     214
➜  wdbomber-eli git:(master) ./wdbomber http://@localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
      26
➜  wdbomber-eli git:(master) ./wdbomber http://@localhost:3300/wd/hub/static 1 500 1 | grep closed | wc -l
     199

How about disabling connection pooling - open a connection on the fly; I understand the performance gain when using a pool(tcp handshake etc) but to help us isolate the problem. Let’s see if the error will be cleared then we can focus on the pool or a different area. My two cents sir.

3 Likes

Thank you, it fixed the issue

We have to look into the connection pool issue and see what can do differently to avoid the error, I’ve seen a lot discussions on the connection pool. If you are able to look into it, let me know. Thanks a lot.

1 Like