josefrichter

josefrichter

Load testing advice

I have an application with several different versions of architecture.

[TLDR: the goal of the application is to let university students enrol into classes for next semester. These systems always failed 20 years ago when I was in uni, and according to my research, they still always fail today.]

To summarize my architectural experiments:

Thousands of people trying to enroll into hundreds of classes with limited capacity

  1. just throw it all to postgres, lock table on writes, cut off when capacity filled
    [1b) throw it all to postgres, no locking, just write, and then read first 30 in class]
  2. throw all to single genserver that serializes it and writes chunks in bulks to postgres every x-items/y-seconds
  3. separate genserver for every single class, that then writes to postgres in bulk

I am doing some load testing of these variants on localhost - I “deploy” a production version on localhost and then use K6 that can create ~150 concurrent POST requests to my API endpoint created solely for testing purposes. I’ve been able to reach ~200k enrolments in 1 minute on localhost, but that number might as well be completely meaningless without fully understanding the whole context.

Obviously this way of testing is very approximate and skipping half of the workflow.

I’d be curious to hear your advice where to go from here, please.

My inclination would be to:

  1. deploy this to real server like fly.io or heroku
  2. do some load testing that simulates the real human workflow, i.e. basically a human logging in, going to certain page, hit “enroll” button, going to another page, hit “enroll” button again, etc.

My guesstimate of real-world scenario is something like 10-30 thousand people trying to enrol to ~1 thousand different classes “all at once”. Each person is trying to enrol to ~30 classes out of that 1 thousand.

Could you please give me some hints how to test these scenarios, compare those variants, and especially reveal the real bottlenecks of each solution?

I know I could start fiddling with RabbitMQ/Kafka/GenStage or squeeze in ETS/Redis/Mnesia/Whatnot, but I’d be a headless chicken running here and there without knowing any real data. Now it’s time to understand and measure what my attempts so far can do.

This is becoming a “here be dragons” area for me, so and hints, guidance or mentorship is highly welcome, please. Happy to add you as collaborator to my repo, if you wish. (btw. should I read any particular book on this?)

Thank you.

Most Liked Responses

akoutmos

akoutmos

Author of Build a Weather Station with Elixir and Nerves

I may be a little biased on this one given I am the author of the library. But I would including PromEx as a dependency of your project and capture the BEAM, Phoenix, and Ecto metrics after running your K6 test suite with each architecture. That way you can test in a production-esque environment and see how your system behaves. I actually wrote a blog post about how to set all this up on Fly.io if that helps: Monitoring Elixir Apps on Fly.io With Prometheus and PromEx · The Fly Blog

LostKobrakai

LostKobrakai

Maybe it’s not everybody, but I’d expect exactly that to be the problem why those systems fail.

These things to my knowledge need to be fair, which usually means nobody can be denied their chance from within the system.

In general I’d also try to figure out how fast students need confirmations about their enrolements. A.k.a. can you accept enrollment attempts and only later come back to the student telling them if they’re successfully enrolled to the course? This “later” doesn’t need to be long, even like minutes or once a minute could allow you to do certain checks less often than per request, potentially even on a separate node, … by splitting writes from reads. It’ll also allow you to better cache/cachebust read heavy parts of the system, which likely will be hit hard as well, especially when they become the success indicators. CQRS in general can be a good step to an event driven system, which allows for a few infrastructure scenarios useful for scaling things independently as needed.

stefanchrobot

stefanchrobot

I’m not an expert on this, but here are my thoughts.

With load testing I would definitely to try to get as close to the production infrastructure as possible to get meaningful results. This includes:

  • The amount of servers (application and DB),
  • The CPU and RAM config of the servers,
  • The DB config,
  • Intermediate servers (e.g. proxies),
  • The amount of data already in the DB.

As far as I understood, you’re building a new app, so I’d go and create two identical production environments and dedicate one to load testing.

:+1: for having some initial guesstimates on the amount of traffic. More things to think about:

  • I’m guessing not everybody is going to sit there on minute one and try to sign up for the classes,
  • Are people allowed to open multiple tabs at once and try to sign up that way?
  • Are they going to be able to prepare an open tab and keep on refreshing it and then hit the “sign up” button? Or will they be able to log in into the system only after a specified point in time?

What I’d be trying to do with the questions above is to predict the behaviour patterns of the users. Then it would be great to replicate them in an automated test. If the app is a SPA, I think I would go with testing it via the API. If it’s SSR, I would replicate what the browser does. Either way, these tests could still be written in Elixir, but it should be a separate app (don’t use Phoenix test helpers as they short circuit certain things; use some HTTP client and Floki).

My predictions for the first run is that it would fail because:

  • The server not being configured to accept that many connections at once,
  • Too few DB connections,
  • DB connection queueing timeouts,
  • Testing tool not being able to produce the needed load (that’s an argument for using a well established load testing tool).

One thing I would consider is to step back and think about whether it’s possible to have a product (not technical) design which would make some of those problems disappear. The UX doesn’t have to be great - people use it once per semester. Maybe there’s a way to spread out the load, e.g. let more senior students sign up first?

Where Next?

Popular in Questions Top

marius95
Hello everyone, I try to use an Javascript Event Handler in my root.html.leex file. Therefore I created a function in the app.js file: ...
New
Kurisu
For example for a current url like http://localhost:4000/cosmetic/products?_utf8=✓&query=perfume&page=2, I would like to get: ...
New
gshaw
What is the idiomatic way of matching for not nil in Elixir? E.g., First way: defp halt_if_not_signed_in(conn, signed_in_account) when...
New
mgjohns61585
Could someone help me? I’m making my first elixir program, number guessing game. I can’t figure out how to convert the user’s guess from ...
New
johnnyicon
Hi all, I’ve just started learning Elixir and Phoenix Framework, so please pardon my n00bness at this stage. I’m trying to use Postgres...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "equa...
New
ycv005
I have followed this StackOverflow post to install the specific version of Erlang. And When I am running mix ecto.setup then getting fol...
New
SoCreat
i’m a new one to elixir which editor can i use vs code? or atom? Thanks! :smiley:
New
belgoros
I’m not a pro in using Regex and can’t figure out why the following behaviour happens, especially if we take into account the difference ...
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

Other popular topics Top

lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
msaraiva
Surface is an experimental library built on top of Phoenix LiveView and its new LiveComponent API that aims to provide a more declarative...
564 43622 214
New
JakeBecker
TL;DR: I’ve just released an implementation of Microsoft’s IDE-independent Language Server Protocol for Elixir. It adds language support ...
1144 53690 245
New
josevalim
Hi everyone, One of the features added to Elixir early on to help integration with Erlang code was the idea of overridable function defi...
New
baxterw3b
Hi guys, i’m new in the Elixir world, and i have to say, that i love it! i’m having some problem to understand anonymous functions with ...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
grych
Hi folks, Few months ago I have announced the proof-of-concept of the library to manipulate the browsers DOM objects directly from Elixi...
639 52341 488
New
AstonJ
We’ve put together this wiki for Phoenix LiveView - please feel free to add any info you feel is worth including. What is Phoenix LiveV...
New
svb
Hi! Currently I want to submit a form by pressing the Enter key. However, since my input field is of type “textarea” this is just adds a...
New

We're in Beta

About us Mission Statement