Elixir synthetic performance test

sasajuric · February 7, 2017, 8:43am

I wrote some tips in this post. It’s about Phoenix, but some points can be applied to plain cowboy:

Bench an OTP release built for production.
Change the setting of max_keepalive option. The default of 100 means that cowboy is dropping the connection after it serves 100 requests, so there will be a lot of reconnecting.
Tune wrk parameters to use as few connections as possible to bring the server a little below overload. The reason is that we want to test the behaviour of the server in the normal mode of operation. Overload is not sustainable for longer periods, so measuring an overloaded server doesn’t tell us much.
I usually do this by starting a test with e.g. 4 threads and 16 connections and observe the load in htop. If it’s locked to 100% then I need to reduce the number of connections. The target I aim for is constant load above 90% but less than 100%. Keep in mind that the number of conns must be divisible by the number of threads, so if I you need to reduce the load, then the next step is e.g. 3 threads with 15 conns.
Once you get satisfying numbers for a brief test (e.g. 10s), run a slightly longer test (e.g. 60s). If all went well, the numbers should be roughly similar in a longer.

Once you have a stable behaviour for both implementations, I think a much longer test would be needed to take GC effects into account. I’d likely go for a test of a couple of hours, paying attention that nothing else runs on the test server.

However, no disrespect, but I personally think that these simple bench tests are a pretty shallow criteria for choosing a technology. Raw speed only matters to some extent, and past that point it might be even counter productive. Way back when I was evaluating Erlang, I made a quick simulation of the target server, and then performed a 12 hours test with 10x of the estimated capacity to verify whether the performance is good enough. Once I proved that it was, I didn’t care anymore whether something else is faster, because it was good enough for my case.

The thing is that there’s more to a system than just plain speed. There are other important factors to consider, such as fault-tolerance, fair share of CPU distribution, support for runtime analysis (so we can understand what goes on in a system that handles thousands or millions of different “things”).

I never heard of Vert.x before, but seeing that it’s based on non-blocking event-based approach and runs on Java, I’d guess that it suffers from issues such as cooperative scheduling, anonymous implicit activities, and callback hell.

Here’s one way how we could compare Elixir (or any other BEAM language) against that. Make a simple server which has two requests: short and infinite. Make short do something trivial like return “Hello World!”. Make infinite run an infinite tight CPU bound loop. For example in Elixir it’s as easy as defp infinite(), do: infinite(). Now start the server, and for the sake of simplicity specify you want to use just one scheduler thread (you can do it with --erl "+S 1"). Issue one infinite request. Then verify that your CPU usage is constantly at 100%. Now issue a short request and observe how you get an immediate response.

This should prove that an occasional long running CPU bound request will not block your entire system nor significantly affect your latency. Then you can try the same thing with Vert.x and see the behaviour. Assuming you properly configure just one worker OS thread, I’d be very surprised if Vert.x wasn’t completely blocked by the infinite request.

Another interesting test is debugging the production. Keep the previous system running, and make sure that infinite request is sill running. Our aim is to discover what causes high CPU usage without restarting the system, or needing to add additional logs and redeploy. A simple way to do this is to start the observer (:observer.start) and go to processes tab. Wait for the next refresh (it make take a couple of seconds). At the top of the list you should see your top CPU burner process. By double clicking on it, you should see its current stacktrace. Finally you will be able to kill the process by right clicking on it in the processes list.

What this test proves is that BEAM goes way beyond “you can start up a lot of small activities”, and offers us some additional ways of managing our production and understanding what went wrong, which is very important if we plan on handling thousands or millions of different requests. We were able to quickly find what causes our CPU problems, and kill the thing without disturbing anything else in the system, or needing to restart the whole thing. A more realistic report of how observer was used to analyze a remote server can be found in the famous 2M Phoenix sockets article. AFAIK this is something not possible with most (if not all) other technologies out there.

Btw. if anyone’s interested, and in the area, I plan to demo this live at my upcoming ElixirDaze talk next month