7.5 second average page load speed for 200 visitors hitting a Phoenix driven website - on a $5/month Digital Ocean Droplet

performance
phoenix

#12

Here’s some interesting numbers for you. I just ran wrk against a couple of your pages. It is a popular web benchmarking tool.

I didn’t want to run too many tests out of respect for your site, but here’s a couple. The gist of it is I ran it against your episode listings, single episode pages and your TOS page with both 50 and 200 concurrent users.

The numbers are actually quite impressive because this will measure the response time of your web server, not the total page load speed (including external scripts like Google Analytics, etc.).

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://alchemist.camp/episodes
Running 30s test @ https://alchemist.camp/episodes
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   425.24ms  133.77ms   1.24s    93.90%
    Req/Sec    15.58      7.48    50.00     83.53%
  3331 requests in 30.05s, 165.44MB read
Requests/sec:    110.85
Transfer/sec:      5.51MB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://alchemist.camp/episodes
Running 30s test @ https://alchemist.camp/episodes
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.30s   240.47ms   1.88s    80.63%
    Req/Sec    20.58     13.87   100.00     74.83%
  4105 requests in 30.05s, 205.56MB read
Requests/sec:    136.61
Transfer/sec:      6.84MB

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://alchemist.camp/episodes/welcome
Running 30s test @ https://alchemist.camp/episodes/welcome
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   809.05ms   54.74ms 966.09ms   86.90%
    Req/Sec     9.04      5.84    30.00     67.53%
  1718 requests in 30.04s, 19.18MB read
Requests/sec:     57.19
Transfer/sec:    653.81KB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://alchemist.camp/episodes/welcome
Running 30s test @ https://alchemist.camp/episodes/welcome
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.59s   293.12ms   1.99s    77.11%
    Req/Sec    10.18      7.35    40.00     72.05%
  1698 requests in 30.06s, 18.96MB read
  Socket errors: connect 0, read 0, write 0, timeout 1449
Requests/sec:     56.49
Transfer/sec:    645.82KB

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://alchemist.camp/terms
Running 30s test @ https://alchemist.camp/terms
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   287.26ms   11.58ms 355.61ms   70.93%
    Req/Sec    21.18     10.35    50.00     59.86%
  4781 requests in 30.05s, 38.66MB read
Requests/sec:    159.09
Transfer/sec:      1.29MB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://alchemist.camp/terms
Running 30s test @ https://alchemist.camp/terms
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   276.36ms   21.86ms 850.73ms   98.16%
    Req/Sec    85.48     29.45   171.00     71.90%
  19525 requests in 30.04s, 157.86MB read
Requests/sec:    649.90
Transfer/sec:      5.25MB

Edit: For context, I just ran the same tool against my personal website which is also running on a $5/month DigitalOcean server, except it’s a static site (Jekyll) running behind nginx.

I know we’re comparing apples and oranges here but it really puts things into perspective at how fast Phoenix really is because I imagine even AlchemistCamp’s terms of service page performs at least 1 DB query?

nick@workstation:/e/tmp$ wrk -t8 -c50 -d30 https://nickjanetakis.com
Running 30s test @ https://nickjanetakis.com
  8 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   109.79ms   17.65ms 339.29ms   92.89%
    Req/Sec    54.07      8.37   101.00     86.20%
  12948 requests in 30.10s, 245.27MB read
Requests/sec:    430.15
Transfer/sec:      8.15MB

nick@workstation:/e/tmp$ wrk -t8 -c200 -d30 https://nickjanetakis.com
Running 30s test @ https://nickjanetakis.com
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   444.70ms   93.91ms   1.82s    90.44%
    Req/Sec    54.24     22.41   158.00     64.06%
  12857 requests in 30.03s, 243.61MB read
  Socket errors: connect 0, read 0, write 0, timeout 20
Requests/sec:    428.08
Transfer/sec:      8.11MB

#13

We need more information here on what the actual server request times are, not what loader.io is reporting. It also sounds like the server is doing a lot of work, but it’s not clear how much work that is. Elixir and Phoenix scalability has been proven out in the wild time and time again, but if a server is taking 8 seconds to complete “work”, then no web server or web framework is going to make that 8 seconds go any faster. If it has to be done, it has to be done. Now, where Phoenix would really help here is if you are indeed blocking on highly latent work, a few requests won’t nuke the server and your app will remain responsive, whereas if we take say, nodejs, any one thing going CPU/IO bound will block the entire event loop. The Erlang scheduler will make sure processes get their fair share even if some requests are doing heavy work taking 8s.


#14

It ‘sounds’ like it is performing multiple large database lookups, then combining them into pages, processing them via a markdown->html processor, then combining those further, and who knows what else, which is indeed quite a bit of work. Would be nice if we had a github link so we could look at it, sounds like there are ample optimization opportunities there! ^.^


#15

The worst case page load time is interesting (perhaps the most interesting thing for tuning), but it often comes from network errors. I wrote a blog post on the impact of network latency, errors and concurrency on benchmarks.

If you are serving your app with Nginx, then running out of sockets will cause requests to queue up, causing delays in the 5-7 second range.

After that, looking into the state of the server when it’s under load would show where the bottleneck is. The cheap Digital Ocean servers only have one CPU core, so that’s often the fundamental limit on performance.

If you run out of RAM, particularly when calling an external process to do the work, then you can’t spawn OS processes, and the overall system performance will drop by orders of magnitude as it thrashes. When you have lots of OS processes fighting over a limited number of CPUs, you can lose 25% of your CPU performance to task switching.

We took over a PHP site that was having serious performance problems. In addition to a lot of standard web traffic, the site used Elasticsearch to perform complex queries which might take 30 seconds. The site used Apache mod_php, so each process consumed 10-20 MB of RAM and a mysql db connection, sitting there waiting for Elasticsearch. The system would go fine until it hit a load spike, then it would quickly collapse. Putting Phoenix in front of the Elasticsearch requests made them much lighter weight, and we could limit the number of concurrent requests to what the system could handle.


#16

You’re not wrong at all! It would take some time to decide how and what to publish, but there are still many, many optimization opportunities and I’m just one person who can only put an hour or two a day into this.

When and if traffic is up ~100x-1,000x what it is now, I’ll have a more pressing need and hopefully the time to really dig into optimization and I’ll seriously consider your idea of open sourcing the whole (site minus the secrets).


#17

An obvious optimisation is to do the Markdown -> HTML conversion when saving each chunk into the database. That way, you don’t have to do it every time you load a page; this is at the expense of slowing down the CMS admin side slightly.


#18

Hey, all. I put out another video that goes over some of this feedback and clarifies that it’s mostly the custom features slowing things down.

Also, I’m really curious how much any of you even think makes sense to be worrying about performance optimization on a site that only gets a few hundred visits a day.

Since tweaking max_keepalive settings or making a cache with ETS is general and is of possible value to everyone, it seemed like it was worth covering it, but spending time optimizing code on my site that’s mostly irrelevant to others makes less sense to me until or unless there were actually issues handling the traffic.


#19

User retention. I would not keep coming back to a site if I have to wait 7-8 secs at every click.

Educational value for you is not worth it? This is basically production-level of work and is a priceless experience (unless you already did it 10 times before).


#20

In that video you mentioned that you auto-increment view counts on every page view, and that it would be pointless to update the cache on update since it would get busted on every view.

I think this is why DHH and team came up with the russian doll strategy of caching in Rails. It’s where you can cache and invalidate a smaller section of a page. For example the view count would be in its own cache, and updating the view count would not invalidate the rest of the page’s cache.

The problem with russian doll caching is it’s a major pain in the ass to set up and there’s a lot of edge cases and boilerplate to wire it all up. It’s why Rails spent so much time implementing helpers at the framework level to do it.

I think a lot of people got at least partially interested in Elixir / Phoenix because the pitch is you can get really nice performance without complex caching strategies (and save those complex caching strategies for when you’re really large and need it).


#21

You can always display things like view counts that aren’t really important overall asynchronously so you get a fairly static page load otherwise (Drab makes it super easy as an example).


#22

I would not keep coming back to a site if I have to wait 7-8 secs at every click.

For sure! If page loads ever took anywhere near that long, performance would be my #1 concern for the project.

Educational value for you is not worth it? This is basically production-level of work and is a priceless experience (unless you already did it 10 times before).

I worked as a Node dev in the US for a few years (at one unicorn and later a YC startup) and did some Rails contracting as well, so web dev isn’t entirely novel to me, but I’m still at a good place in the learning curve for Elixir and especially its ecosystem.

Mostly, it’s time. I only really have about one or maybe two hours a day I can spend on this stuff and am also recovering from pretty severe RSI that limits keyboard time. I certainly don’t want a broken site experience, but I also have to prioritize what’s useful for either my learning or that of my viewers. So each week it’s a question of “How should I allocate these 5-10 hours?”


#23

My point of view is that the educational value of doing this is bigger for the viewers and not the author, so its a very good thing that we can point beginners that are reaching their first performance issues.

Now, asking as a beginner myself, couldn’t you use just a genserver to cache the episodes instead of using genserver+ets? Would that affect the cache performance?


#24

Technically you can easily use an Agent that stores a map with top 20-50 articles.


#25

An Agent would quickly become a bottleneck in a system that’s already slow (or maybe not, 200 req/s might not be enough to overload a genserver). But let’s assume that there is a bit more load (>10000 req/s), then if these top articles don’t change often (like no more than a few times per minute), then compiling them as a list into a module using something like mochiglobal: https://github.com/lpgauth/foil or https://github.com/discordapp/fastglobal or any other lib like that would improve system’s efficiency significantly.

As for

[…] you auto-increment view counts on every page view, and that it would be pointless to update the cache on update since it would get busted on every view.

I’d use ets tables with their counters, probably. But there are still faster ways to implement counters with beam if it’s really necessary: https://github.com/andytill/oneup, if you start experiencing ets lock contention. The first step would be to use an ets table per scheduler though, before reaching for nifs like andytill/oneup. There are also counters available in rust, of course, which might seem safer to some people.

As for caching, I don’t think it’s necessary for rendering pages, just have a compiled markdown template as iolists of html binaries and input the data that changes before pushing it over the socket, that’s basically how the benchmarks I mentioned above were structured and it was very fast …


#26

It’s funny to me, because I used to work on a site that served 14m visitors a month (I don’t remember the pageviews), and when we were trying to optimise it, we were told peak traffic was 70 requests per seconds.

So we tried to optimise for 100 requests per second, but it was taking to long, so in the end the managers decided to just throw another £10k per year at the database and call it a day. The pages loaded in 1-3 seconds and they were paying £35k a year in AWS costs.

And I’m looking at these numbers and thinking “we probably could’ve run that for $5 a month”.


#27

Compiling the Markdown template (or even its output) to an iolist would be ideal. Are there any tools for doing this that you’d recommend?


#28

Why not just save both the markdown and the compiled HTML to the database and show the HTML in your templates without any processing (since it’s already compiled)?


#29

A big concatenated string of HTML would definitely be slower than an iolist.


#30

Can you explain why?

What makes loading a snippet of pre-compiled HTML from a database slower than an iolist? I don’t even know what an iolist is.


#31

I don’t even know what an iolist is.

Have a look at these two blog posts:
Elixir and IO Lists, Part 1: Building Output Efficiently
Elixir and IO Lists, Part 2: IO Lists in Phoenix