Evaluating Elixir / Phoenix for a web-scale, performance-critical application

Matt-Hornsby · June 10, 2016, 5:32am

Hi all - cross posting this from the elixir-talk mailing list. I could use some help. I am currently evaluating Elixir and Phoenix for a performance-critical application for a Fortune 500 company. This could be another great case study for Elixir and Phoenix if I can show that it can meet our needs. Initial performance testing looked phenomenal, but I am running into some performance concerns that will force me to abandon this tech stack entirely if I cannot make the case.

The setup: an out-of-the box phoenix app using mix phoenix.new. No ecto. Returning a static json response. Basically a hello-world app.

The hardware:

Macbook Pro, 16gb, 8 core, 2.5ghz, running elixir/phoenix natively, and also using docker container
Amazon EC2 T2.Medium running Elixir Docker image

The tests: used ab, wrk, siege, artillery, curl with a variety of configurations. Up to 100 concurrent connections. Not super scientific, i know… but

No matter what I try, Phoenix logs out impressive numbers to stdout - generally on the order of 150-300 microseconds. However, none of the load testing tooling agrees. No matter the hardware or load test configuration, I see around 20-40 ms response times. The goal for the services that I am designing is 20ms and several thousand requests per second. The load tests that @chrismccord and others have published suggest that I should be able to expect 3ms or less when running localhost, but i’m not seeing anything close to that.

Would anyone be willing to work with me to look at some options here? I’d be incredibly grateful. Don’t make me go back to Java, please Is this even possible what I am asking?

terakilobyte · June 10, 2016, 6:05am

I’ve read several times that for performance testing you should run the app in production mode.

MIX_ENV=prod mix compile.protocols
MIX_ENV=prod PORT=4001 elixir -pa _build/prod/consolidated -S mix phoenix.server

This is from the 0.7.2 docs so I’m not sure it’s all still needed, but might be worth a try.

Matt-Hornsby · June 10, 2016, 6:14am

Thanks for the great suggestion @terakilobyte - I just tried running it in PROD mode:

MIX_ENV=prod mix compile
MIX_ENV=prod mix phoenix.digest
MIX_ENV=prod PORT=4001 mix phoenix.server

./wrk -t8 -c100 -d30S --timeout 2000 http://localhost:4001/api/products

Running 30s test @ http://localhost:4001/api/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 29.16ms 32.83ms 365.59ms 94.76%
Req/Sec 494.23 139.45 770.00 76.14%
116284 requests in 30.06s, 71.44MB read
Requests/sec: 3868.18
Transfer/sec: 2.38MB

Still not very good unfortunately

Matt-Hornsby · June 10, 2016, 6:18am

Interestingly, with only 10 concurrent connections:

Running 10s test @ http://localhost:4001/api/products
8 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.18ms 3.91ms 40.59ms 88.15%
Req/Sec 1.12k 249.24 1.78k 68.38%
89005 requests in 10.01s, 54.68MB read
Requests/sec: 8890.15
Transfer/sec: 5.46M

sasajuric · June 10, 2016, 10:16am

There was a similar thread recently. You may find some tips there.

Some low-hanging fruit to improve perf would be:

Raise log level in prod to :warn to suppress logging each request.
If you’re testing a REST endpoint, make sure it goes through the :api pipeline, and not the :browser one.
Build an OTP release and bench against that

Also, check your cpu usage while testing. If all your CPUs are not constantly near 100% then you may have some bottleneck.

thinkpadder1 · June 10, 2016, 4:20pm

Check this micro benchmark tool released by a member of the community: Benche and BencheeCSV 0.1.0 release - easy and extensible (micro) benchmarking

benwilson512 · June 10, 2016, 4:22pm

To expand on Sasa’s point, the :browser endpoint by default generates CSRF tokens to be used in forms as a security measure. These can be fairly expensive to generate at least compared to a hello_world JSON endpoint. Definitely make sure you aren’t doing that.

Matt-Hornsby · June 10, 2016, 4:34pm

Thank you for your quick response on this @sasajuric. I looked around in the forums first but didn’t see the other thread. Lots of great information over there to try. Pretty sure I’m piping through :api - here’s my router:

defmodule MarketApi.Router do
  use MarketApi.Web, :router

  pipeline :api do
    plug :accepts, ["json"]
  end

  scope "/api", MarketApi do
    pipe_through :api
    resources "/products", ProductController, except: [:new, :edit]
  end
end

CPU is pretty well maxed out across all cores when I run the tests.

Thanks @benwilson512 - I’m super green to Phoenix, so I appreciate the tip. I think I am using :api. Is there anything I need to do besides pipe_through :api ?[quote=“thinkpadder1, post:6, topic:832, full:true”]
Check this micro benchmark tool released by a member of the community: Benche and BencheeCSV 0.1.0 release - easy and extensible (micro) benchmarking
[/quote]

I hadn’t seen this before - I’ll give it a whirl. Is it possible with the tool to have it benchmark where Phoenix is spending all its time?

defoemark · June 11, 2016, 5:30pm

Try to make a release with exrm, and then start as a service:
rel/your_app/bin/your_app start.

thinkpadder1 · June 12, 2016, 4:42pm

Also, check out Elixir’s very own Observer tool.

romul · June 15, 2016, 5:03pm

Could you put your test app to Github or smth else? It would help to found a reason.

Matt-Hornsby · June 16, 2016, 1:52am

Hi all, just wanted to say thank you for all of your timely help on this. I have been working through the suggestions a bit at a time as time permits, and I will put up some code as soon as I can. So far, turning off all of the output to stdout and building an exrm release have provided some improvements - though there are still a few scenarios where MIX_ENV=PROD outperforms even the exrm release.

I’ll update as soon as I have some more information. Thanks again!

sudostack · June 25, 2016, 10:11pm

Any updates, Matt? I came to this thread from Sasa’s blog post and I remember a thread on Hacker News performing benchmarks at Rackspace. In your original post you had mentioned going back to Java, so I thought I’d link you to their tests: https://gist.github.com/omnibs/e5e72b31e6bd25caf39a

subbu05 · June 26, 2016, 1:38am

Few things to consider

If you running tests from outside AWS you might see some delay because of connecting time to AWS.
2.Try to have AWS instance with SSD volumes instead EBS volumes.

If you can send the sample code, I can take a look and suggest you better.

Matt-Hornsby · June 27, 2016, 3:28am

Thanks for this @sudostack - I have been concentrating on getting some benchmarks for Go, Java and Elixir since last I checked in. Elixir is next up and I hope to get some more data this week. Thanks for the link to the tests - would be awesome to see some updated numbers. It’s impressive to see the throughput that Phoenix was getting back then. I’ve been getting <20k RPS on my macbook.[quote=“subbu05, post:15, topic:832”]
If you running tests from outside AWS you might see some delay because of connecting time to AWS.2.Try to have AWS instance with SSD volumes instead EBS volumes.
[/quote]

I’m going to try hitting it from either the same box or one within the same VPC. Good advice on #2. We were spinning these up on Docker instances, which people don’t seem to do with Elixir. I haven’t seen a good reason yet, but I am curious as to what overhead is introduced by Docker.

garytaylor · August 30, 2016, 11:42am

I have been doing similar benchmarks etc… for my application which is a json api - and I have found that elixir / phoenix is not the fastest thing out there (nor does it claim to be), but combined with the balance of productivity vs performance in my opinion it beats scala / java / go and the rest, but I am considering it coming from ruby / rails so its similarity to ruby was important too. Saying that, I have worked with scala and java before and cannot see the frameworks such as play being anything like as easy to develop with, but they probably will get more requests per second out of a single box, but elixir scales in a more predictable way is what I have read when adding more hardware - not got to that point yet though.

OvermindDL1 · August 30, 2016, 2:54pm

Phoenix is fast, but only as fast as your pipeline allows it to be. The point of phoenix is to be ‘near’ the fastest while having beyond intense reliability.

However my main point for this post, *DO*NOT*TEST*FROM*WINDOWS*. We learned that the hard way from work. If a server is on windows or the testing client is on Windows then (at least Windows 10) introduces a near 200ms latency on initial connections while it ‘fills the TCP buffer’, and at least on windows 10 we’ve tried everything to disable it, registry edits, setting the TCP connection with NONAGLE and a hundred things in-between…

chocolatedonut · December 30, 2022, 10:56am

I know this has been looong ago, but I have to clarify this for myself: how is it possible that Phoenix, which is built on top of Plug, has a (slightly) lower latency and considerably better consistency (lower σ)?

benwilson512 · December 30, 2022, 2:58pm

Given how old these benchmarks are it’s anyone’s guess, I’d suggest they be redone before anyone worries about analyzing their results.