Elixir synthetic performance test

mpugach · February 6, 2017, 1:27pm

A boss of mine is not convinced to use Elixir for our production apps.

Another round of our polemics was the test.

He wrote two small apps using Vert.x and Elixir.

They render static text and real data from MongoDB serialized into JSON.

For static response Vert.x performed about 10% better. Also having better stability when JVM is warmed enough. Both have some lag after certain amount of requests. We think it is GC. But on Elixir version the lag appears more frequently.

For real data response the Mongo serialization was broken in Elixir version.

So I have fixed the serialization, removed Plug (since it is not needed here) and experimented with HiPE compilation. Moved to cowboy 2.0.0-pre.6 (but did not measured this part enough, nothing changed for me it seems). Here is the test app (You can tune the wrk and ab params for your machine, cause I have an old one).

Can you, please, help me to build the most optimized app for the test?

Personally I have doubts about this part. MongoDB driver first creates some structs, then we transform those structs to strings, too much work I think.

Also what can we do to optimize the static part more?

michalmuskala · February 6, 2017, 2:41pm

Having a quick glance, I can see two things:

you only have 10 database connections opened (that’s the default). Depending on the concurrency of the test, it might be a good idea to open some more.
Another place you could optimise is to try a NIF-based JSON encoder (like jiffy) instead of Poison.

OvermindDL1 · February 6, 2017, 3:19pm

Especially this since MongoDB is so oddly designed in that it uses so much JSON…
Really should not use MongoDB…

dom · February 6, 2017, 10:59pm

The optimization discussion is always interesting, but… does a 10 or 20% difference in speed really matter here? You’re basically choosing between the JVM vs BEAM. I think the difference is close enough that you should consider maintainability, learning curve, debuggability, whether you’ll need clustering or not, etc.

In my experience, being able to run multiple apps in a cluster, and connect to it anytime and trace and analyze things was a game changer. On the other end, the ease of finding programmers and the high adoption in the enterprise world are big pluses of the JVM.

(Sorry for not actually answering the question!)

mpugach · February 7, 2017, 5:23am

Thank you

On wrk -t20 -c100 -d15s it improved from 13k to 16k requests.

sasajuric · February 7, 2017, 8:43am

I wrote some tips in this post. It’s about Phoenix, but some points can be applied to plain cowboy:

Bench an OTP release built for production.
Change the setting of max_keepalive option. The default of 100 means that cowboy is dropping the connection after it serves 100 requests, so there will be a lot of reconnecting.
Tune wrk parameters to use as few connections as possible to bring the server a little below overload. The reason is that we want to test the behaviour of the server in the normal mode of operation. Overload is not sustainable for longer periods, so measuring an overloaded server doesn’t tell us much.
I usually do this by starting a test with e.g. 4 threads and 16 connections and observe the load in htop. If it’s locked to 100% then I need to reduce the number of connections. The target I aim for is constant load above 90% but less than 100%. Keep in mind that the number of conns must be divisible by the number of threads, so if I you need to reduce the load, then the next step is e.g. 3 threads with 15 conns.
Once you get satisfying numbers for a brief test (e.g. 10s), run a slightly longer test (e.g. 60s). If all went well, the numbers should be roughly similar in a longer.

Once you have a stable behaviour for both implementations, I think a much longer test would be needed to take GC effects into account. I’d likely go for a test of a couple of hours, paying attention that nothing else runs on the test server.

However, no disrespect, but I personally think that these simple bench tests are a pretty shallow criteria for choosing a technology. Raw speed only matters to some extent, and past that point it might be even counter productive. Way back when I was evaluating Erlang, I made a quick simulation of the target server, and then performed a 12 hours test with 10x of the estimated capacity to verify whether the performance is good enough. Once I proved that it was, I didn’t care anymore whether something else is faster, because it was good enough for my case.

The thing is that there’s more to a system than just plain speed. There are other important factors to consider, such as fault-tolerance, fair share of CPU distribution, support for runtime analysis (so we can understand what goes on in a system that handles thousands or millions of different “things”).

I never heard of Vert.x before, but seeing that it’s based on non-blocking event-based approach and runs on Java, I’d guess that it suffers from issues such as cooperative scheduling, anonymous implicit activities, and callback hell.

Here’s one way how we could compare Elixir (or any other BEAM language) against that. Make a simple server which has two requests: short and infinite. Make short do something trivial like return “Hello World!”. Make infinite run an infinite tight CPU bound loop. For example in Elixir it’s as easy as defp infinite(), do: infinite(). Now start the server, and for the sake of simplicity specify you want to use just one scheduler thread (you can do it with --erl "+S 1"). Issue one infinite request. Then verify that your CPU usage is constantly at 100%. Now issue a short request and observe how you get an immediate response.

This should prove that an occasional long running CPU bound request will not block your entire system nor significantly affect your latency. Then you can try the same thing with Vert.x and see the behaviour. Assuming you properly configure just one worker OS thread, I’d be very surprised if Vert.x wasn’t completely blocked by the infinite request.

Another interesting test is debugging the production. Keep the previous system running, and make sure that infinite request is sill running. Our aim is to discover what causes high CPU usage without restarting the system, or needing to add additional logs and redeploy. A simple way to do this is to start the observer (:observer.start) and go to processes tab. Wait for the next refresh (it make take a couple of seconds). At the top of the list you should see your top CPU burner process. By double clicking on it, you should see its current stacktrace. Finally you will be able to kill the process by right clicking on it in the processes list.

What this test proves is that BEAM goes way beyond “you can start up a lot of small activities”, and offers us some additional ways of managing our production and understanding what went wrong, which is very important if we plan on handling thousands or millions of different requests. We were able to quickly find what causes our CPU problems, and kill the thing without disturbing anything else in the system, or needing to restart the whole thing. A more realistic report of how observer was used to analyze a remote server can be found in the famous 2M Phoenix sockets article. AFAIK this is something not possible with most (if not all) other technologies out there.

Btw. if anyone’s interested, and in the area, I plan to demo this live at my upcoming ElixirDaze talk next month

mpugach · February 7, 2017, 2:24pm

Thank you for so broad reply.

I did not tried this yet, but the case is mentioned in their documentation. To be fair there is a workaround.

I was told there is JMX for this.

Will try to measure in normal mode.

In overload case the Vertx version performs about 30% better for now. Do not want to publish the results after your arguments about the methodology. The apps are accessible and it can be verified (warm the JVM first).

My previous statement about 10% was based on Vertx Mongo against Elixir static on another machine. I rewrote the app, so it is better now.

Maybe using Elli will make it even better.

Also need to verify unhandled exception case in Vertx app.

outlog · February 7, 2017, 2:57pm

just after a quick look: (might contain mistakes/misunderstandings on my part)

elixir is benchmarked through a router while vertx is not.
the db query for elixir has limit 20 while the vertx has limit 10 (twice the data/serialization?).
the db pools are not identical. vertx has the default 100 limit while elixir has 20.

not sure of the impact on the results, but would be nice to remove some of the benchmark smell…

mpugach · February 7, 2017, 3:08pm

Thank you, will fix that.

sasajuric · February 7, 2017, 3:24pm

Oh I’m positive there’s always a workaround, even if it’s not explicitly supported by the library itself. Worse comes to worst, you can always start such activity in a separate OS process.

But the thing is that you have to know upfront whether e.g some request processing is blocking. And that becomes increasingly harder as the project becomes more complex (which IME inevitably happens for every software project other than the ones which are cancelled ). The thing is that blocking might happen unintentionally, due to a bug, or non-optimal piece of code. Not only have I seen such thing happen and paralyze the production completely, but I actually caused it myself by introducing a suboptimal code With Elixir/Erlang, such mistake is much less likely to take the whole production down, or even have observable effects on it.

Another problem with explicitly identifying the blocking code is this. How can I know that foo() is potentially blocking for a long time? To know that, I need to understand the complete stack trace of foo including my own code, as well as the code of all dependencies invoked from it. And I need to consider every possible input that can arrive to foo. And when I make my decision, it’s only based on the current code snapshot. A seemingly simple and unrelated change might break my expectations tomorrow. I’m exaggerating, yes, but it’s a thing that becomes increasingly harder to manage as the code becomes more complex.

That problem is in Elixir/Erlang non-existent. If you want to run things separately you run them in different processes. It’s as simple as that

I’m really not familiar with JVM, but I’m not surprised that there’s something like that given it’s maturity. However, libraries such as Vert.x implement additional lightweight mechanism on top of VM, and therefore the request handlers are likely not special VM entities. In fact, in event-based technologies, request handlers are usually completely anonymous.

Now given that you could have a single thread multiplexing thousands of different requests there are some questions. Can you trace the execution of a single request? Can you get info (e.g. memory usage, stack trace) of a single request handler? Can you terminate a single request handler without disturbing anything else, even if that request is blocking? If the answer is no, then the tech is nowhere near capabilities of Elixir/Erlang when it comes to analyzing and fixing a live running system.

I’d be somewhat surprised if in any case Elixir turned out to be faster. But as said, considering only the speed, and measuring it in a 15s synthetic bench is IMO not a good comparison. The question should be whether both technologies are sufficiently performant for the real problem you’re solving. If yes, then it’s perhaps time to consider other aspects of both technologies, such as e.g. fault-tolerance support If not, and assuming you invested some effort into making it faster, then I guess you need to discard the option which is not performant enough, even if that option is Elixir

Mandemus · February 7, 2017, 4:32pm

Vert.x is a polygot implementation of the node paradigm on the JVM. You can spawn ‘verticles’ to handle a single API endpoint which all communicate through a messaging backend that extends to the clients Think node + messageMQ perhaps. They do have some good abstractions to reduce callback hell.

I played with it for awhile but in the end it was a lonely affair, with very little activity on their Google group.

It should not be the GC. GC in Erlang is per-process, so there is no stop the world event.

mpugach · February 8, 2017, 12:55pm

Thank you guys. You refilled me with sufficient amount of arguments to prepare answers for myself.

I do not know if I will continue with the benchmark, but it would be nice to shorten the performance gap a little and collect adequate measures, in addition to other aspects you mentioned.

mkunikow · March 17, 2018, 11:17am

I love Vert.x + RxJava …, good alternative to commercial spring boilerplate.

OndrejValenta · August 11, 2019, 11:43pm

We also played for some time with VertX but after a while it became obvious it’s a somewhat obscure, although very very fast, technology with a very little user base, at least they are not visible.

I’ve tried their approach to web pages generation but it is like light years from the experience that you can have with say ASP NET Core. It is faster, definitely, but not when it comes to development, when it comes to development it felt like this was just a student’s project that was enough for his barber’s pages.

There are just six guys at Red Hat working actively on this project and the rest lays on the community and I don’t think it’s working out that well. I’m just working on their https://start.vertx.io/ page it could have been done in days, we are “working” on it for months because of their workload.

danyalmh · September 24, 2021, 11:00pm

You said some lag in JVM !!!
Using which JVM GC ?

I use ZGC for really huge ram but not appear any lag !!!

mpugach · September 25, 2021, 5:39am

Almost five years have passed, I can’t find the Ruby-Vert.x-based repo to repeat the test.

But the main point in our case was not about JVM vs. ErlangVM. It was about the easy transition of Ruby developers to some faster tech stack. Since with Vert.x, you just write Ruby, and Elixir has ruby-ish syntax.

We found out that in both cases, Ruby developers have to adjust to a new mindset. The syntax is secondary.

If one wants a raw performance, they should go entirely with C++ or Rust (according to Round 20 of TechEmpower benchmark, which is latest for the moment). Or drop some NIFs into critical parts of their app.

We have Ruby, Elixir, and Java (Spring, I believe) projects in our company. Clients come with some preset, and we never had to convince them to change the stack because of inability to scale.

If we start a new project, the choice is based on the available and planned resources.

I still feel that Elixir allows me to design better solutions in terms of architecture.