I am a total beginner and took to building a basic TCP server with concurrency in Elixir. Under performance testing this server is… completely terrible, to the point where there has to be a mistake in it somewhere. I am not really sure where the error is though. Since i’m sure its something dumb its probably obvious. I mostly used the pattern in the docs here so it is puzzling…
Hey @onelastdance can you be more specific about which performance metric you are using to judge performance? I see client code but I don’t see anything being actually measured.
I was actually expecting someone to just point out I was doing something obviously wrong so I didn’t bother. I am running oha (a rust networking perf tool) with 100 concurrent connections to that server running locally. The response times build higher and higher - over a 20 second test it will take 15 seconds for some connections to receive a response. Something strange is happening but I’m too new to the platform to figure it out
That would explain the numbers I’m seeing in testing but why is it synchronous? Processes are spawned for each accepted connection so there should be parallelism?
The implementation is still a single acceptor and not a pool, which will be a bottleneck, and a single supervisor, which will be another bottleneck. The code in the guides were not optimized for performance at all. I will add some notes, but consider looking into Ranch (Erlang) or Thousand Island (Elixir).
There was also a fantastic series of articles on writing a scalable tcp acceptor for Erlang (or Elixir), but I think it was a decade old or something and I cannot find it
Even with a single acceptor loop I would still expect 100 concurrent connections to be ok though. I’m really not sure about the supervisors at all since that’s just a black box to me. To me it looks like gen_tcp.send/2 is actually behaving synchronously and/or getting stuck somehow since even the simplest version of this program where you respond immediately with a 200 ok is still slow…
Yes, I agree with the assessment. I can try to run your gist later. The most obvious thing that showed up so far is that you are using String.length, but that counts graphemes and not bytes, you should use byte_size instead. This could lead to wrong content-length but I am assuming you are not echoing unicode characters.
Also double check if the client is happy with your HTTP responses. It may be that they don’t like HTTP/1.1 200 OK\r\n\r\n at all and they expect a content-length or similar.
You could also try thousand_island or ranch and compare results.
In the interest of science, I boiled down the whole problem to this script:
Most of the code is just window dressing, the question is why is that code so slow? Here are the performance stats from oha. The numbers just don’t pass the smell test, since I’ve build webservers in other languages and have an intuition for the performance numbers even with some simple unoptimized thing.
The difference is ~8000 req/sec on a single connection but less than 1 req/sec with 100 connections. It has to be port exhaustion or something going on in the background…
{nodelay, Boolean} (TCP/IP sockets) - If Boolean == true, option TCP_NODELAY is turned on for the socket, which means that also small amounts of data are sent immediately.
The documentation your referenced is not really meant to teach you how to build a performant tcp server. It’s rather meant to teach you how to build your first tcp server that works, which is a whole different goal. But @josevalim has added a section pointing to production grade implementations people can study.
If someone wants to send a PR that mentions gen_tcp options for fine-tuning, with a link to the documentation and without mentioning any optin in particular, then it will be welcome as well!