Performance - Best practices?

merlin · December 13, 2017, 5:51pm

Hi all,

I’ve run the Techempower fortune benchmark locally, Elixir/Phoenix and Golang/fasthttp.
Removed all unnecessary code in both Elixir and Go.
I’m trying to get the best performance out of the Elixir/Phoenix version.

Tuning for Elixir/Phoenix:

disabled all unnecessary plugs in router.ex:

pipeline :browser do
plug :accepts, [“html”]
end
set DB adapter pool size to 40
used the same HTML template as Go
set logging to error
compiled protocols
start server: MIX_ENV=prod PORT=4000 elixir -S mix phx.server

No tuning for Go.

I use wrk to perform the load tests:
wrk -t6 -c12000 -d60s -T10s http://.../fortune

Client is physical and server is a domU/2vcpu/4GB. OS: FreeBSD. 10 runs for each load test.

For Elixir/Phoenix:

Average latency 408ms, max 8s
Req/s 2400

For Golang:

Average latency 123ms, max 1s
Req/s 8600

I did my first Go tests with net/http and peaked at 4000 req/s so I replaced it with fasthttp and got 8500+ req/s
Are there faster alternatives to Cowboy ?
How do you fine tune production environment ? What OS is best suited for the Erlang VM ? What FS ? specific OS tunables ?

Thanks

OvermindDL1 · December 13, 2017, 6:21pm

Uh, you have something MAJOR wrong there. Can you upload this to github and give us commands to git clone ... it then a single (set of?) commands to run our own tests? Even the golang average latency of 123ms seems crazy high!

outlog · December 13, 2017, 6:31pm

-c12000 isn’t that 12k keepalive connections, seems a bit much…

techempower is not real world - quite the opposite actually… so do your own benchmark similar to what your requirements are…

wrk is not a great benchmark tool… use something like https://gatling.io

also allow me to suggest this video

outlog · December 13, 2017, 6:35pm

also since it’s go vs elixir(otp/erlang) this one is great at understanding latency and GC handling etc.

video:

idi527 · December 13, 2017, 7:54pm

Are there faster alternatives to Cowboy ?

There is elli TechEmpower Benchmarks Round 14 - #3

 +        for _ <- 1..n do
 +          id = rand_id()
 +          [row] = Hello.SQL.query(conn, "world_by_id", @world_by_id, [id])
 +          world(row)
 +        end

Might be replaced with a map, since for comprehension in elixir does a reduce where a map would suffice, IRRC. Though it’ll hardly change anything.

I’m not sure if .eex templates compile to iolists. If not, then that’d also be a possible improvement in performance.

OvermindDL1 · December 13, 2017, 9:02pm

They do.

merlin · December 14, 2017, 8:03am

Thanks for Elli, I’ll give it a try

To reproduce this test: just co the techempower code, elixir/phoenix and golang/fasthttp versions.
For the phoenix version you may also bootstrap an empty app and add a fortune MVC.
Then add some basic tuning as I’ve done. The average latency is high probably because of the number of incoming connections. That’s the point. Same conditions for the Go version. I’d like to optimize at least:

the average latency ratio between Elixir and Go: almost 4
get rid of the max latency of 8s, which is a lot

The techempower/fortune benchmark gives us a good basis to apply the most basic tuning. That’s what I’m interested in now, and that’s why I use wrk and not JMeter. Later, if we choose Elixir, I’ll perform more advanced load tests.

Except plugs, DB pool size, logging are there other config. parameters I could change to improve performance ?Beam parameters I could tune ? JVM-style ?

idi527 · December 14, 2017, 8:23am

Then at what point and why do they get turned into binaries? https://github.com/idi-ot/eex_test

iex(1)> EExTest.fortune_html [a: "a", b: "b"]
"<!DOCTYPE html>\n<html>\n  <head>\n    <title>Fortunes</title>\n  </head>\n  <body>\n    <table>\n      <tr><th>id</th><th>message</th></tr>\n      \n        <tr><td>a</td><td>a</td></tr>\n      \n        <tr><td>b</td><td>b</td></tr>\n      \n    </table>\n  </body>\n</html>\n"

I would’ve expected something like this

iex(3)> EExTest.custom_fortune_html [a: "a", b: "b"]
["<!DOCTYPE html>\n<html>\n  <head>\n    <title>Fortunes</title>\n  </head>\n  <body>\n    <table>\n      <tr><th>id</th><th>message</th></tr>\n",
 ["<tr><td>", "a", "</td><td>", "a", "</td></tr>", "<tr><td>", "b", "</td><td>",
  "b", "</td></tr>"], "    </table>\n  </body>\n</html>\n"]

Some tests

# render small list

iex(6)> :timer.tc(fn -> Enum.each(1..100_000, fn _ -> EExTest.fortune_html([a: "a", b: "b"]) end) end)
{480068, :ok}

iex(7)> :timer.tc(fn -> Enum.each(1..100_000, fn _ -> EExTest.custom_fortune_html([a: "a", b: "b"]) end) end)
{279135, :ok}

# render empty list

iex(10)> :timer.tc(fn -> Enum.each(1..100_000, fn _ -> EExTest.fortune_html([]) end) end)
{144888, :ok}

iex(11)> :timer.tc(fn -> Enum.each(1..100_000, fn _ -> EExTest.custom_fortune_html([]) end) end)
{74700, :ok}

# render bigger list

iex(12)> data = Enum.map(1..1000, fn i -> {i, i} end)

iex(14)> :timer.tc(fn -> Enum.each(1..1_000, fn _ -> EExTest.custom_fortune_html(data) end) end)
{277821, :ok}

iex(15)> :timer.tc(fn -> Enum.each(1..1_000, fn _ -> EExTest.fortune_html(data) end) end)
{1078125, :ok}

idi527 · December 14, 2017, 9:06am

You might want to read this series of blog posts (this one is the last one, I think) http://dbeck.github.io/Wrapping-up-my-Elixir-TCP-experiments/. There is a part about beam parameters near the end and in the comment section. And docs (Emulator Flags section) http://erlang.org/doc/man/erl.html

merlin · December 14, 2017, 9:15am

Thanks

This looks really interesting esp. the FreeBSD part

Edit: apparently Mr Beck switched to Rust a year ago. Looking for a C++ alternative ?
It’s getting harder and harder to choose the right language: Java, Scala, PHP, Go, Erlang, Elixir, Rust, Ruby, Crystal, Pony, Nim … Oh did I forget JS ?

dom · December 14, 2017, 10:28am

max_keepalive might matter

http://theerlangelist.com/article/phoenix_latency

idi527 · December 14, 2017, 11:37am

There have also been several threads on elixirforum about performance in benchmarks. Some of them might be relevant to you.

which_is_the_fastest benchmark though had a strange benchmarking tool, so those results might be discarded.

cowboy1-and-cowboy2 is also somewhat controversial since in a blog post by @potatosalad Load Testing cowboy 2.0.0-rc.1 · potatosalad cowboy2 was much faster than cowboy1 and in my tests it was the opposite (different benchmarking tools).

josevalim · December 14, 2017, 11:58am

Some general notes:

Make sure to remove unecessary plugs from your endpoint.
Make sure you are comparing apples with apples. For example, if one is logging and the other isn’t, that’s going to be a huge difference in performance.
Make sure to set max_keepalive otherwise connections will be recycled over and over again
A larger connection pool often implies worst performance, because there is less resource sharing

But in general, if you are getting 408ms for such a simple page, @OvermindDL1 is right: there is something absurdly wrong.

merlin · December 14, 2017, 4:05pm

Thanks for all your suggestions !

408ms is an average based on the wrk load test above. I’m going to try to improve the results, I think I’ll get better load times by tuning the EVM/elixir and FreeBSD.

Well if someone could perform the same tests and confirm the cowboy/fasthttp load time ratio … Should not take more than 15 minutes

Regarding the max_keepalive: before increasing it I want to understand how connection pooling works exactly in cowboy. And assess if this server is subject to Slow HTTP Dos Attacks such as described here:

In one of the posts above I saw someone setting it to 5_000_000 … wow

idi527 · December 14, 2017, 4:13pm

And assess if this server is subject to Slow HTTP Dos Attacks such as described here:

Still, I usually put haproxy in front of erlang webservers (I usually use elli).

dom · December 14, 2017, 4:19pm

Cowboy’s max_keepalive is not a timeout, it’s the maximum number of HTTP requests that can be done reusing the same connection. It has a separate timeout and maximum number of connections.

https://github.com/ninenines/cowboy/commit/a013becc66b50db038c1f7f3539040b4482bba18

https://github.com/ninenines/cowboy/commit/5d698250b228229001ca9966a390bf1545fcd0b0

merlin · December 14, 2017, 4:42pm

So I’ve re-run several tests setting max_keepalive to 5_000_000.

With:
wrk -t12 -c12 -d1s -T10s http://.../fortune
I get an average latency of 8.3 ms and max latency of 100ms.

With (same as my initial tests):
wrk -tc -c12000 -d60s -T10s http://.../fortune
I get an average latency of 404ms ms and max latency of 707ms.

Max latency is much better ! 700ms instead of 8000ms
I’ve also tried to lower db pool size to 20, avg latency decreased to 390ms.
Though the average latency remains @400ms.

Thanks for links regarding the slow attacks, I’ll have to read all of this. Even if Cowboy servers are not in front I prefer to be very cautious when dealing with such config params. Might as well induce resource exhaustion. I’ll run some tests when I have time.

I’m going deeper underground