** (RuntimeError) ThousandIsland on HTTP load test

I tested an HTTP endpoint that gives back json like {"user_token":null}.
But, the problem is some requests take too long (which results in 7989 timeouts…), and many warning messages in the logger.
What could be the problem?

-Production- 23:47:02.996 [error] Task #PID<0.12802.0> started from #PID<0.12800.0> terminating
** (RuntimeError) Unexpected error in accept: :emfile
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:42: ThousandIsland.Acceptor.accept/5
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.12800.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
-Production- 23:47:02.996 [error] Task #PID<0.13939.0> started from #PID<0.13937.0> terminating
** (RuntimeError) Unexpected error in accept: :emfile
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:42: ThousandIsland.Acceptor.accept/5
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.13937.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
-Production- 23:47:02.997 [error] Task #PID<0.13678.0> started from #PID<0.13676.0> terminating
** (RuntimeError) Unexpected error in accept: :emfile
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:42: ThousandIsland.Acceptor.accept/5
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.13676.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
-Production- 23:47:02.999 [error] Task #PID<0.13336.0> started from #PID<0.13334.0> terminating
** (RuntimeError) Unexpected error in accept: :emfile
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:42: ThousandIsland.Acceptor.accept/5
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.13334.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
** (stop) exited in: GenServer.call(nil, :acceptor_info, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.16.2) lib/gen_server.ex:1103: GenServer.call/3
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:18: ThousandIsland.Acceptor.run/1
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.67281.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
-Production- 23:48:40.104 [error] Task #PID<0.68437.0> started from #PID<0.66288.0> terminating
** (stop) exited in: GenServer.call(nil, :acceptor_info, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.16.2) lib/gen_server.ex:1103: GenServer.call/3
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:18: ThousandIsland.Acceptor.run/1
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.66288.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
-Production- 23:48:40.108 [error] Task #PID<0.68560.0> started from #PID<0.66330.0> terminating
** (stop) exited in: GenServer.call(nil, :acceptor_info, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir 1.16.2) lib/gen_server.ex:1103: GenServer.call/3
    (thousand_island 1.3.5) lib/thousand_island/acceptor.ex:18: ThousandIsland.Acceptor.run/1
    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
Function: &ThousandIsland.Acceptor.run/1
    Args: [{#PID<0.12353.0>, #PID<0.66330.0>, %ThousandIsland.ServerConfig{port: 4000, transport_module: ThousandIsland.Transports.TCP, transport_options: [ip: {0, 0, 0, 0, 0, 0, 0, 0}], handler_module: Bandit.DelegatingHandler, handler_options: %{opts: %{http_1: [max_header_count: 50], http_2: [max_header_count: 50], websocket: []}, plug: {ChattingWeb.Endpoint, []}, handler_module: Bandit.InitialHandler, http_1_enabled: true, http_2_enabled: true, websocket_enabled: true}, genserver_options: [timeout: 60000], supervisor_options: [], num_acceptors: 1000, num_connections: 1048576, max_connections_retry_count: 5, max_connections_retry_wait: 1000, read_timeout: 60000, shutdown_timeout: 15000, silent_terminate_on_error: false}}]
-Production- 23:48:40.108 [error] Task #PID<0.68740.0> started from #PID<0.65202.0> terminating
** (stop) exited in: GenServer.call(nil, :acceptor_info, 5000)    
    

Load results:

$ wrk -t20 -c9900 -d30s "http://localhost:4000/token" --latency

Running 30s test @ http://localhost:4000/token
  20 threads and 9900 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   104.61ms  121.59ms   2.00s    97.84%
    Req/Sec     3.68k   676.65    14.18k    79.04%
  Latency Distribution
     50%   93.00ms
     75%  109.48ms
     90%  121.73ms
     99%  742.60ms
  2194409 requests in 30.10s, 0.89GB read
  Socket errors: connect 0, read 13492, write 0, timeout 7989
Requests/sec:  72906.82
Transfer/sec:     30.18MB

EMFILE from accept means the process is trying to use more socket connections than it’s allowed to have filehandles.

I’m not 100% certain how that translates into user-facing timeouts, but it certainly isn’t helping the system perform well.

You’ll either need to adjust the limit when starting the BEAM, or adjust the settings of ThousandIsland to not accept as many connections.

1 Like

You were absolutely right about file descriptors! Simply increasing the hard limit for the number of open descriptors per process completely solves the issue

Running 30s test @ http://127.0.0.1:4000/token
  20 threads and 9900 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   127.40ms   52.07ms   1.47s    94.65%
    Req/Sec     3.94k   356.81     8.21k    89.22%
  Latency Distribution
     50%  123.67ms
     75%  132.27ms
     90%  142.21ms
     99%  290.75ms
  2331183 requests in 30.10s, 0.94GB read
Requests/sec:  77451.24
Transfer/sec:     32.06MB