Performance - Best practices?

Hi all,

I’ve run the Techempower fortune benchmark locally, Elixir/Phoenix and Golang/fasthttp.
Removed all unnecessary code in both Elixir and Go.
I’m trying to get the best performance out of the Elixir/Phoenix version.

Tuning for Elixir/Phoenix:

  • disabled all unnecessary plugs in router.ex:

    pipeline :browser do
    plug :accepts, [“html”]

  • set DB adapter pool size to 40

  • used the same HTML template as Go

  • set logging to error

  • compiled protocols

  • start server: MIX_ENV=prod PORT=4000 elixir -S mix phx.server

No tuning for Go.

I use wrk to perform the load tests:
wrk -t6 -c12000 -d60s -T10s http://.../fortune

Client is physical and server is a domU/2vcpu/4GB. OS: FreeBSD. 10 runs for each load test.

For Elixir/Phoenix:

  • Average latency 408ms, max 8s
  • Req/s 2400

For Golang:

  • Average latency 123ms, max 1s
  • Req/s 8600

I did my first Go tests with net/http and peaked at 4000 req/s so I replaced it with fasthttp and got 8500+ req/s
Are there faster alternatives to Cowboy ?
How do you fine tune production environment ? What OS is best suited for the Erlang VM ? What FS ? specific OS tunables ?



Uh, you have something MAJOR wrong there. Can you upload this to github and give us commands to git clone ... it then a single (set of?) commands to run our own tests? Even the golang average latency of 123ms seems crazy high!


-c12000 isn’t that 12k keepalive connections, seems a bit much…

techempower is not real world - quite the opposite actually… so do your own benchmark similar to what your requirements are…

wrk is not a great benchmark tool… use something like

also allow me to suggest this video


also since it’s go vs elixir(otp/erlang) this one is great at understanding latency and GC handling etc.



Are there faster alternatives to Cowboy ?

There is elli TechEmpower Benchmarks Round 14

 +        for _ <- 1..n do
 +          id = rand_id()
 +          [row] = Hello.SQL.query(conn, "world_by_id", @world_by_id, [id])
 +          world(row)
 +        end

Might be replaced with a map, since for comprehension in elixir does a reduce where a map would suffice, IRRC. Though it’ll hardly change anything.

I’m not sure if .eex templates compile to iolists. If not, then that’d also be a possible improvement in performance.

They do.

Thanks for Elli, I’ll give it a try :slight_smile:

To reproduce this test: just co the techempower code, elixir/phoenix and golang/fasthttp versions.
For the phoenix version you may also bootstrap an empty app and add a fortune MVC.
Then add some basic tuning as I’ve done. The average latency is high probably because of the number of incoming connections. That’s the point. Same conditions for the Go version. I’d like to optimize at least:

  1. the average latency ratio between Elixir and Go: almost 4 :fearful:
  2. get rid of the max latency of 8s, which is a lot

The techempower/fortune benchmark gives us a good basis to apply the most basic tuning. That’s what I’m interested in now, and that’s why I use wrk and not JMeter. Later, if we choose Elixir, I’ll perform more advanced load tests.

Except plugs, DB pool size, logging are there other config. parameters I could change to improve performance ?Beam parameters I could tune ? JVM-style ?

Then at what point and why do they get turned into binaries?

iex(1)> EExTest.fortune_html [a: "a", b: "b"]
"<!DOCTYPE html>\n<html>\n  <head>\n    <title>Fortunes</title>\n  </head>\n  <body>\n    <table>\n      <tr><th>id</th><th>message</th></tr>\n      \n        <tr><td>a</td><td>a</td></tr>\n      \n        <tr><td>b</td><td>b</td></tr>\n      \n    </table>\n  </body>\n</html>\n"

I would’ve expected something like this

iex(3)> EExTest.custom_fortune_html [a: "a", b: "b"]
["<!DOCTYPE html>\n<html>\n  <head>\n    <title>Fortunes</title>\n  </head>\n  <body>\n    <table>\n      <tr><th>id</th><th>message</th></tr>\n",
 ["<tr><td>", "a", "</td><td>", "a", "</td></tr>", "<tr><td>", "b", "</td><td>",
  "b", "</td></tr>"], "    </table>\n  </body>\n</html>\n"]

Some tests

# render small list

iex(6)> -> Enum.each(1..100_000, fn _ -> EExTest.fortune_html([a: "a", b: "b"]) end) end)
{480068, :ok}

iex(7)> -> Enum.each(1..100_000, fn _ -> EExTest.custom_fortune_html([a: "a", b: "b"]) end) end)
{279135, :ok}
# render empty list

iex(10)> -> Enum.each(1..100_000, fn _ -> EExTest.fortune_html([]) end) end)
{144888, :ok}

iex(11)> -> Enum.each(1..100_000, fn _ -> EExTest.custom_fortune_html([]) end) end)
{74700, :ok}
# render bigger list

iex(12)> data =, fn i -> {i, i} end)

iex(14)> -> Enum.each(1..1_000, fn _ -> EExTest.custom_fortune_html(data) end) end)
{277821, :ok}

iex(15)> -> Enum.each(1..1_000, fn _ -> EExTest.fortune_html(data) end) end)
{1078125, :ok}

You might want to read this series of blog posts (this one is the last one, I think) There is a part about beam parameters near the end and in the comment section. And docs (Emulator Flags section)

1 Like


This looks really interesting esp. the FreeBSD part :smile:

Edit: apparently Mr Beck switched to Rust a year ago. Looking for a C++ alternative ?
It’s getting harder and harder to choose the right language: Java, Scala, PHP, Go, Erlang, Elixir, Rust, Ruby, Crystal, Pony, Nim … Oh did I forget JS ? :cold_sweat:

max_keepalive might matter

There have also been several threads on elixirforum about performance in benchmarks. Some of them might be relevant to you.

which_is_the_fastest benchmark though had a strange benchmarking tool, so those results might be discarded.

cowboy1-and-cowboy2 is also somewhat controversial since in a blog post by @potatosalad cowboy2 was much faster than cowboy1 and in my tests it was the opposite (different benchmarking tools).

1 Like

Some general notes:

  1. Make sure to remove unecessary plugs from your endpoint.

  2. Make sure you are comparing apples with apples. For example, if one is logging and the other isn’t, that’s going to be a huge difference in performance.

  3. Make sure to set max_keepalive otherwise connections will be recycled over and over again

  4. A larger connection pool often implies worst performance, because there is less resource sharing

But in general, if you are getting 408ms for such a simple page, @OvermindDL1 is right: there is something absurdly wrong.

Thanks for all your suggestions !

408ms is an average based on the wrk load test above. I’m going to try to improve the results, I think I’ll get better load times by tuning the EVM/elixir and FreeBSD.

Well if someone could perform the same tests and confirm the cowboy/fasthttp load time ratio … Should not take more than 15 minutes :smile:

Regarding the max_keepalive: before increasing it I want to understand how connection pooling works exactly in cowboy. And assess if this server is subject to Slow HTTP Dos Attacks such as described here:

In one of the posts above I saw someone setting it to 5_000_000 … wow :sweat_smile:

And assess if this server is subject to Slow HTTP Dos Attacks such as described here:

Still, I usually put haproxy in front of erlang webservers (I usually use elli).

Cowboy’s max_keepalive is not a timeout, it’s the maximum number of HTTP requests that can be done reusing the same connection. It has a separate timeout and maximum number of connections.

1 Like

So I’ve re-run several tests setting max_keepalive to 5_000_000.

wrk -t12 -c12 -d1s -T10s http://.../fortune
I get an average latency of 8.3 ms and max latency of 100ms.

With (same as my initial tests):
wrk -tc -c12000 -d60s -T10s http://.../fortune
I get an average latency of 404ms ms and max latency of 707ms.

Max latency is much better ! 700ms instead of 8000ms
I’ve also tried to lower db pool size to 20, avg latency decreased to 390ms.
Though the average latency remains @400ms.

Thanks for links regarding the slow attacks, I’ll have to read all of this. Even if Cowboy servers are not in front I prefer to be very cautious when dealing with such config params. Might as well induce resource exhaustion. I’ll run some tests when I have time.

I’m going deeper underground :smile:

1 Like