TCP Server has terrible performance

onelastdance · January 9, 2025, 2:41pm

I am a total beginner and took to building a basic TCP server with concurrency in Elixir. Under performance testing this server is… completely terrible, to the point where there has to be a mistake in it somewhere. I am not really sure where the error is though. Since i’m sure its something dumb its probably obvious. I mostly used the pattern in the docs here so it is puzzling…

https://hexdocs.pm/elixir/task-and-gen-tcp.html

Cheers

CODE: tcp_server.ex · GitHub

EDIT: Simpler script that demonstrates the same issue: gist:adf5a8ff9f66e6a3c772a7d754e224f2 · GitHub

benwilson512 · January 9, 2025, 2:58pm

Hey @onelastdance can you be more specific about which performance metric you are using to judge performance? I see client code but I don’t see anything being actually measured.

onelastdance · January 9, 2025, 3:02pm

I was actually expecting someone to just point out I was doing something obviously wrong so I didn’t bother. I am running oha (a rust networking perf tool) with 100 concurrent connections to that server running locally. The response times build higher and higher - over a 20 second test it will take 15 seconds for some connections to receive a response. Something strange is happening but I’m too new to the platform to figure it out

LostKobrakai · January 9, 2025, 3:04pm

The guide you’re referencing is doing a quite naive implementation. It’s essentially handling all connections synchronously not doing any parallelization. I’d suggest talking a look at https://www.youtube.com/watch?v=owz50_NYIZ8&list=PLd7I3U4fDsULTLqbRAkWzA002-IzMe8fl for something more in depth.

onelastdance · January 9, 2025, 3:07pm

That would explain the numbers I’m seeing in testing but why is it synchronous? Processes are spawned for each accepted connection so there should be parallelism?

LostKobrakai · January 9, 2025, 3:08pm

Ah, I only looked at the first code example not realizing it’s updated thoughout the guide.

josevalim · January 9, 2025, 3:24pm

The implementation is still a single acceptor and not a pool, which will be a bottleneck, and a single supervisor, which will be another bottleneck. The code in the guides were not optimized for performance at all. I will add some notes, but consider looking into Ranch (Erlang) or Thousand Island (Elixir).

There was also a fantastic series of articles on writing a scalable tcp acceptor for Erlang (or Elixir), but I think it was a decade old or something and I cannot find it

onelastdance · January 9, 2025, 3:28pm

Even with a single acceptor loop I would still expect 100 concurrent connections to be ok though. I’m really not sure about the supervisors at all since that’s just a black box to me. To me it looks like gen_tcp.send/2 is actually behaving synchronously and/or getting stuck somehow since even the simplest version of this program where you respond immediately with a 200 ok is still slow…

josevalim · January 9, 2025, 3:45pm

Yes, I agree with the assessment. I can try to run your gist later. The most obvious thing that showed up so far is that you are using String.length, but that counts graphemes and not bytes, you should use byte_size instead. This could lead to wrong content-length but I am assuming you are not echoing unicode characters.

Also double check if the client is happy with your HTTP responses. It may be that they don’t like HTTP/1.1 200 OK\r\n\r\n at all and they expect a content-length or similar.

You could also try thousand_island or ranch and compare results.

onelastdance · January 9, 2025, 4:35pm

In the interest of science, I boiled down the whole problem to this script:

gist.github.com

https://gist.github.com/JeremyFenwick/adf5a8ff9f66e6a3c772a7d754e224f2

gistfile1.exs

defmodule Server do
  def start(port \\ 4221) do
    {:ok, socket} = :gen_tcp.listen(port, [:binary, active: false, reuseaddr: true])
    IO.puts("Starting server on port #{port}")
    accept_connections(socket)
  end

  def accept_connections(socket) do
    {:ok, client} = :gen_tcp.accept(socket)
    spawn(fn -> handle_client(client) end)

This file has been truncated. show original

Most of the code is just window dressing, the question is why is that code so slow? Here are the performance stats from oha. The numbers just don’t pass the smell test, since I’ve build webservers in other languages and have an intuition for the performance numbers even with some simple unoptimized thing.

oha http://localhost:4221 -n 100 -c 100
Summary:
Success rate: 43.00%
Total: 104.8264 secs
Slowest: 35.1940 secs
Fastest: 0.0005 secs
Average: 7.1412 secs
Requests/sec: 0.9540

Total data: 0 B
Size/request: 0 B
Size/sec: 0 B

Error distribution:
[56] connection error
[1] error writing a body to connection

By comparison, here are the results for a single connection:

oha http://localhost:4221 -n 100000 -c 1
Summary:
Success rate: 100.00%
Total: 11.5247 secs
Slowest: 0.0029 secs
Fastest: 0.0001 secs
Average: 0.0001 secs
Requests/sec: 8677.0101

Total data: 0 B
Size/request: 0 B
Size/sec: 0 B

Status code distribution:
[200] 99997 responses

Error distribution:
[3] operation was canceled

~ took 11s

The difference is ~8000 req/sec on a single connection but less than 1 req/sec with 100 connections. It has to be port exhaustion or something going on in the background…

Gazler · January 9, 2025, 5:07pm

You probably need to configure gen_tcp here.

https://www.erlang.org/doc/apps/kernel/gen_tcp.html#listen/2

Specifically the backlog option, to allow more connections in the queue.

You may also want nodelay from here:

https://www.erlang.org/doc/apps/kernel/inet#setopts/2

{nodelay, Boolean} (TCP/IP sockets) - If Boolean == true, option TCP_NODELAY is turned on for the socket, which means that also small amounts of data are sent immediately.

josevalim · January 9, 2025, 5:12pm

Good call. I dug down the options used by ThousandIsland by default:

github.com

mtrudel/thousand_island/blob/1c236c724f2fdc85d29f5aef338e2f4cb527e2af/lib/thousand_island/transports/tcp.ex#L21-L27


      
          ```elixir
          backlog: 1024,
          nodelay: true,
          send_timeout: 30_000,
          send_timeout_close: true,
          reuseaddr: true
          ```

onelastdance · January 9, 2025, 5:15pm

I just found that via chatgpt lmao. Great minds! Yes, this one adjustment to the code fixes everything:

    {:ok, socket} = :gen_tcp.listen(port, [:binary, active: false, reuseaddr: true, backlog: 1024])

Might be worth putting that in the documentation I referenced above as that feels like esoteric knowledge. Thanks everyone!

LostKobrakai · January 9, 2025, 5:19pm

The documentation your referenced is not really meant to teach you how to build a performant tcp server. It’s rather meant to teach you how to build your first tcp server that works, which is a whole different goal. But @josevalim has added a section pointing to production grade implementations people can study.

josevalim · January 9, 2025, 5:20pm

If someone wants to send a PR that mentions gen_tcp options for fine-tuning, with a link to the documentation and without mentioning any optin in particular, then it will be welcome as well!

onelastdance · January 9, 2025, 5:21pm

That’s fair enough - I’m probably the only person who used that code and blasted it with oha lol.

hyperoceanic · January 9, 2025, 8:52pm

Hey @onelastdance - would you mind re-running your benchmark and publishing the results with the new code?

chgeuer · January 9, 2025, 8:53pm

You might also consider this one Network Programming in Elixir and Erlang: Write High-Performance, Scalable, and Reliable Apps with TCP and UDP by Andrea Leopardi which is touching on exactly that stuff, TCP servers and clients, pooling acceptors, etc.