That’s debatable. Last time I checked, these tests took only 10 sec, so they didn’t really measure the cost of garbage collection (which might significantly affect the results for stop-the-world-gc runtimes). They also didn’t observe how CPU bound tasks interfere with I/O ones (which under some conditions can cause tremendous latency increase). For other issues, see comments made by @cmkarlsson in this thread.
It’s not just about language. There are properties, such as fault-tolerance, the ability to troubleshoot production, ecosystem, and whatnot. Those things matter, and IMO matter much more than the speed. Not that speed is irrelevant, but past some point of “acceptable”, it matters much less, and sometimes it can be counterproductive.
There is a difference between performance matters and “I want to use every nanosecond of CPU time as best as possible”. Of course that all of us want to have some reasonable performance. On the other hand, aiming for the fastest possible framework (there’s no such thing, but let’s pretend it exists) is IMO usually wrong, b/c that speed gain is likely obtained by some trade-off, which might not be immediately obvious.
My usual advice is to measure whether the candidate framework is good enough for the desired case. That’s what I did when I first evaluated Erlang, by running a 12 hours load test on a simulation of the system, with 10x of the estimated load. Once I was convinced that Erlang easily handles that (and that it can easily scale), I didn’t really care about the speed anymore. I knew that if in some special cases I need to squeeze out the best performance, I can easily step outside of Erlang for that.
When I mention “real world” I talk about systems running in production (and those which are being developed to run in production in the future). If it’s running in production, it’s real. Otherwise it’s not. Consequently, TE benches are not real-world, but rather some (IMO very poor) attempt to simulate the production. If that makes me condescending, so be it
The problem is that your production is not the same as mine, and definitely not the same as the thing being benched in TE. Hence, even though framework foo
might be 100x faster than bar
in the TE bench, it might well happen that bar
actually produces better number for your case (or mine). Therefore, even if TE benches were done properly (and I don’t think they are), they still wouldn’t tell you a lot about your system, and might even lead you to a bad conclusion.
Which is why I believe there’s no such thing as the proper general bench comparison of frameworks. You can design one for your own system, simply to discard any framework which makes it very hard to deal with the desired load. Past that point, the decision on which framework to use should IMO mostly revolve around other properties.