7.5 second average page load speed for 200 visitors hitting a Phoenix driven website - on a $5/month Digital Ocean Droplet

performance
phoenix
Tags: #<Tag:0x00007f8ead504680> #<Tag:0x00007f8ead5044a0>

#1

Hi,

Preface: I’m still developing my first Elixir app and haven’t shipped anything to production yet, but I will be shipping a mission critical application by the end of this year.

The author of an Elixir screencast series recently released a new episode about caching database calls with ETS.

Early on in this video he benchmarks 2 pages. One that gets a single database record by ID and another that lists a bunch of records.

In both test cases, he was averaging about 7 to 7.5 full seconds for Phoenix to provide a response when ~200 visitors were hitting the website concurrently.

That episode can be found here https://www.youtube.com/watch?v=wmktvDHde6w (the first 4 minutes or so talk about setting up the benchmark conditions).

I very rarely base my technical decisions on benchmarks but I know a lot of the Elixir / Phoenix marketing is based on how well it performs, and how you don’t need to do complicated things like Russian doll caching to get very respectable performance.

But in this case, it looks like a pretty typical Phoenix app running in production with no optimizations is nearly falling over with ~200 people hitting the site concurrently.

What’s really interesting is he hand rolls his own ETS based cache through out the episode and even with caching it still took 4 to 5 seconds to render a response.

Can someone shed some light on what’s going on here, or report back how their Phoenix sites are doing in production? What steps have you taken to improve performance?

Edit: Of course I’m still going to be developing my app with Phoenix no matter what because I enjoy working with the framework but it would be cool to assemble a list of pitfalls to avoid or quick wins we could get for improving performance.


#2

I just published that screencast.

I’m a little bit intrigued by your response since I originally did most of my development in Node and have found nearly an order of magnitude better performance with Phoenix for a few different sites I’ve migrated.

The page load speeds are based on Loader.io’s reports. At least in my own testing, the majority of the time to load a page is external scripts, right up to the point where it’s close to the most the server can handle.

On plain .eex templates, I can get about 1200 reqs/second with Nginx in front and over 2000 if Phoenix handles port 80 directly, but the pages you’re referring to are part of a CMS. It’s definitely not a typical Phoenix app in that I’m loading text from a database, running it through a Markdown compiler, plus custom regex expansions of syntactic sugar I’ve added on top. It’s optimized for my productivity, not server performance. That said, random strangers on the internet have been offering unsolicited praise for the page load speeds, so I’m not super worried about it at this point. I would definitely upgrade to the $10 or even $15/month server and take another look at optimizations if the site were getting even remotely close to 200 visits per second.

I thought about demoing a bare site that had much higher numbers but opted to go with a “real world” use case. I sure don’t want to be scaring people off of Phoenix, though.

What stack are you currently running and what kind of loads can it handle with loader.io on a $5/month droplet?


#3

He’s on a $5 a month Digital Ocean droplet :slight_smile: What kind of performance were you expecting on that? I’m pretty sure some of my Ruby/PHP apps would crumble on that, let alone manage 2000 reqs/second.


#4

I think I achieved >10k req/s in my benchmarks with elli (and cowboy probably as well) on 512mb digital ocean droplets. But it was about a year ago. Digitalocean has improved their offerings since.


#5

It’s my fault for the inline Markdown, really. The pages that don’t do that handle over 1k/second.

What I should have done was use a different site or integrate Nabo to precompile the Markdown (and my additions to it) before digging into ETS.


#6

What I should have done was use a different site or integrate Nabo to precompile the Markdown (and my additions to it) before digging into ETS.

Just make sure you use iodata (ideally precompiled into functions as well), then your rendering speed will be fine.

Can someone shed some light on what’s going on here, or report back how their Phoenix sites are doing in production? What steps have you taken to improve performance?

@nickjanetakis I’m not using phoenix, but I generally don’t have problems with erlang web servers performance-wise.


#7

I think it’s a typical phoenix app because it’s doing a bunch of real world things. Rendering templates, accessing the DB and performing various other I/O. That’s a great example and much better than a contrived benchmark where you return an empty 200 response.

Is markdown being parsed and compiled on your episode listing page too?


#8

No, it’s not. It’s just loading all the episodes with entities and topics preloaded (plus the standard event analytics done all over the site).


#9

Blockquote No, it’s not. It’s just loading all the episodes with entities and topics preloaded (plus the standard event analytics done all over the site).

Oh, because in a previous reply you wrote “It’s my fault for the inline Markdown, really. The pages that don’t do that handle over 1k/second”.

But if your episode listing page doesn’t deal with compiling markdown on the fly, then it sounds like something else is causing the slow down?


#10

Single episode pages break at a lower threshold than the listings and query far less data! :sweat:

I should have said “the stuff served by my page controller”. E.g., the home page, the TOS page, the various easter egg pages, etc can handle 1k/sec (according to loader.io).

tldr; the inline markdown has huge performance cost, loading nearly 100 episodes is fairly expensive (but convenient for the user), and just rendering a “normal” template is fast and cheap.


#11

His numbers were 200 requests, not 2,000 (big difference).

But as for expectations, I would have anticipated less latency per request because 5 to 7 seconds is a really really long time. but I’m also not sure if loader.io is measuring a response from the server, or the total page load time which are 2 very different numbers.


#12

Here’s some interesting numbers for you. I just ran wrk against a couple of your pages. It is a popular web benchmarking tool.

I didn’t want to run too many tests out of respect for your site, but here’s a couple. The gist of it is I ran it against your episode listings, single episode pages and your TOS page with both 50 and 200 concurrent users.

The numbers are actually quite impressive because this will measure the response time of your web server, not the total page load speed (including external scripts like Google Analytics, etc.).

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://alchemist.camp/episodes
Running 30s test @ https://alchemist.camp/episodes
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   425.24ms  133.77ms   1.24s    93.90%
    Req/Sec    15.58      7.48    50.00     83.53%
  3331 requests in 30.05s, 165.44MB read
Requests/sec:    110.85
Transfer/sec:      5.51MB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://alchemist.camp/episodes
Running 30s test @ https://alchemist.camp/episodes
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.30s   240.47ms   1.88s    80.63%
    Req/Sec    20.58     13.87   100.00     74.83%
  4105 requests in 30.05s, 205.56MB read
Requests/sec:    136.61
Transfer/sec:      6.84MB

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://alchemist.camp/episodes/welcome
Running 30s test @ https://alchemist.camp/episodes/welcome
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   809.05ms   54.74ms 966.09ms   86.90%
    Req/Sec     9.04      5.84    30.00     67.53%
  1718 requests in 30.04s, 19.18MB read
Requests/sec:     57.19
Transfer/sec:    653.81KB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://alchemist.camp/episodes/welcome
Running 30s test @ https://alchemist.camp/episodes/welcome
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.59s   293.12ms   1.99s    77.11%
    Req/Sec    10.18      7.35    40.00     72.05%
  1698 requests in 30.06s, 18.96MB read
  Socket errors: connect 0, read 0, write 0, timeout 1449
Requests/sec:     56.49
Transfer/sec:    645.82KB

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://alchemist.camp/terms
Running 30s test @ https://alchemist.camp/terms
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   287.26ms   11.58ms 355.61ms   70.93%
    Req/Sec    21.18     10.35    50.00     59.86%
  4781 requests in 30.05s, 38.66MB read
Requests/sec:    159.09
Transfer/sec:      1.29MB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://alchemist.camp/terms
Running 30s test @ https://alchemist.camp/terms
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   276.36ms   21.86ms 850.73ms   98.16%
    Req/Sec    85.48     29.45   171.00     71.90%
  19525 requests in 30.04s, 157.86MB read
Requests/sec:    649.90
Transfer/sec:      5.25MB

Edit: For context, I just ran the same tool against my personal website which is also running on a $5/month DigitalOcean server, except it’s a static site (Jekyll) running behind nginx.

I know we’re comparing apples and oranges here but it really puts things into perspective at how fast Phoenix really is because I imagine even AlchemistCamp’s terms of service page performs at least 1 DB query?

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://nickjanetakis.com
Running 30s test @ https://nickjanetakis.com
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   109.79ms   17.65ms 339.29ms   92.89%
    Req/Sec    54.07      8.37   101.00     86.20%
  12948 requests in 30.10s, 245.27MB read
Requests/sec:    430.15
Transfer/sec:      8.15MB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://nickjanetakis.com
Running 30s test @ https://nickjanetakis.com
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   444.70ms   93.91ms   1.82s    90.44%
    Req/Sec    54.24     22.41   158.00     64.06%
  12857 requests in 30.03s, 243.61MB read
  Socket errors: connect 0, read 0, write 0, timeout 20
Requests/sec:    428.08
Transfer/sec:      8.11MB

#13

We need more information here on what the actual server request times are, not what loader.io is reporting. It also sounds like the server is doing a lot of work, but it’s not clear how much work that is. Elixir and Phoenix scalability has been proven out in the wild time and time again, but if a server is taking 8 seconds to complete “work”, then no web server or web framework is going to make that 8 seconds go any faster. If it has to be done, it has to be done. Now, where Phoenix would really help here is if you are indeed blocking on highly latent work, a few requests won’t nuke the server and your app will remain responsive, whereas if we take say, nodejs, any one thing going CPU/IO bound will block the entire event loop. The Erlang scheduler will make sure processes get their fair share even if some requests are doing heavy work taking 8s.


#14

It ‘sounds’ like it is performing multiple large database lookups, then combining them into pages, processing them via a markdown->html processor, then combining those further, and who knows what else, which is indeed quite a bit of work. Would be nice if we had a github link so we could look at it, sounds like there are ample optimization opportunities there! ^.^


#15

The worst case page load time is interesting (perhaps the most interesting thing for tuning), but it often comes from network errors. I wrote a blog post on the impact of network latency, errors and concurrency on benchmarks.

If you are serving your app with Nginx, then running out of sockets will cause requests to queue up, causing delays in the 5-7 second range.

After that, looking into the state of the server when it’s under load would show where the bottleneck is. The cheap Digital Ocean servers only have one CPU core, so that’s often the fundamental limit on performance.

If you run out of RAM, particularly when calling an external process to do the work, then you can’t spawn OS processes, and the overall system performance will drop by orders of magnitude as it thrashes. When you have lots of OS processes fighting over a limited number of CPUs, you can lose 25% of your CPU performance to task switching.

We took over a PHP site that was having serious performance problems. In addition to a lot of standard web traffic, the site used Elasticsearch to perform complex queries which might take 30 seconds. The site used Apache mod_php, so each process consumed 10-20 MB of RAM and a mysql db connection, sitting there waiting for Elasticsearch. The system would go fine until it hit a load spike, then it would quickly collapse. Putting Phoenix in front of the Elasticsearch requests made them much lighter weight, and we could limit the number of concurrent requests to what the system could handle.


#16

You’re not wrong at all! It would take some time to decide how and what to publish, but there are still many, many optimization opportunities and I’m just one person who can only put an hour or two a day into this.

When and if traffic is up ~100x-1,000x what it is now, I’ll have a more pressing need and hopefully the time to really dig into optimization and I’ll seriously consider your idea of open sourcing the whole (site minus the secrets).


#17

An obvious optimisation is to do the Markdown -> HTML conversion when saving each chunk into the database. That way, you don’t have to do it every time you load a page; this is at the expense of slowing down the CMS admin side slightly.


#18

Hey, all. I put out another video that goes over some of this feedback and clarifies that it’s mostly the custom features slowing things down.

Also, I’m really curious how much any of you even think makes sense to be worrying about performance optimization on a site that only gets a few hundred visits a day.

Since tweaking max_keepalive settings or making a cache with ETS is general and is of possible value to everyone, it seemed like it was worth covering it, but spending time optimizing code on my site that’s mostly irrelevant to others makes less sense to me until or unless there were actually issues handling the traffic.


#19

User retention. I would not keep coming back to a site if I have to wait 7-8 secs at every click.

Educational value for you is not worth it? This is basically production-level of work and is a priceless experience (unless you already did it 10 times before).


#20

In that video you mentioned that you auto-increment view counts on every page view, and that it would be pointless to update the cache on update since it would get busted on every view.

I think this is why DHH and team came up with the russian doll strategy of caching in Rails. It’s where you can cache and invalidate a smaller section of a page. For example the view count would be in its own cache, and updating the view count would not invalidate the rest of the page’s cache.

The problem with russian doll caching is it’s a major pain in the ass to set up and there’s a lot of edge cases and boilerplate to wire it all up. It’s why Rails spent so much time implementing helpers at the framework level to do it.

I think a lot of people got at least partially interested in Elixir / Phoenix because the pitch is you can get really nice performance without complex caching strategies (and save those complex caching strategies for when you’re really large and need it).