It is internal to your application. Every benchmark must be done with your specific requirement in mind. Are there any better benchmark's out there for everyone to see and compare? No, not that I know of. But the TE benchmarks are less then ideal. I think they have the right idea with trying to measure more than plainly reading and writing to a socket which a lot of HTTP benchmarks are doing. They add some computation and database IO in the background. I think they fail on how the tests are executed and benchmarked.
The also state themselves that the benchmarks cannot be used to compare different framework and/or technologies and that they only show the "maximum" a framework can reach with the wind in the back, a slight downhill slope and lots of luck.
The measure and take the best throughput run of a number of tries. They disregard maximum and/or 99% latency and throughput is max. They also fall into the "coordinated omission" problem which is a real problem if the client side is open to the internet vs used as an internal API between two closed servers (where you can limit connections and have backpressure).
I've done a fair bit of benchmarking for our internal application in erlang, golang and java (and python but that was discarded quickly). The numbers in TE benchmarks don't stack up in our scenario and are often misleading. Especially as we are very concerned with maximum and high percentage latency.
We are doing heaps of crypto, calls out to external HTTP servers and some database IO (but most of it is cached). In our tests golang is the fastest (but only by 10-20%), then erlang and finally java. On the other hand, erlang is the most stable and gives most even latency. Even under 95% CPU load we still manage to have the maximum latency within reasonable numbers. ( average 15ms, max 200 ms)
golang starts behaving worse with latency under those circumstances and java goes off at very low load (i.e some requests take seconds!).
According to the TE benchmarks golang and java should completely outshine erlang but for our application it is much closer and the throughput makes such little difference that other factors (stability and fault tolerance) are more important.
Trying to create a better benchmark is obviously possible, but to make it more accurate the they will need to run for a longer time and I'd think that will make it economically unfeasible for such a large number of frameworks.