I have an application with several different versions of architecture.
[TLDR: the goal of the application is to let university students enrol into classes for next semester. These systems always failed 20 years ago when I was in uni, and according to my research, they still always fail today.]
To summarize my architectural experiments:
Thousands of people trying to enroll into hundreds of classes with limited capacity
- just throw it all to postgres, lock table on writes, cut off when capacity filled
[1b) throw it all to postgres, no locking, just write, and then read first 30 in class] - throw all to single genserver that serializes it and writes chunks in bulks to postgres every x-items/y-seconds
- separate genserver for every single class, that then writes to postgres in bulk
I am doing some load testing of these variants on localhost - I “deploy” a production version on localhost and then use K6 that can create ~150 concurrent POST requests to my API endpoint created solely for testing purposes. I’ve been able to reach ~200k enrolments in 1 minute on localhost, but that number might as well be completely meaningless without fully understanding the whole context.
Obviously this way of testing is very approximate and skipping half of the workflow.
I’d be curious to hear your advice where to go from here, please.
My inclination would be to:
- deploy this to real server like fly.io or heroku
- do some load testing that simulates the real human workflow, i.e. basically a human logging in, going to certain page, hit “enroll” button, going to another page, hit “enroll” button again, etc.
My guesstimate of real-world scenario is something like 10-30 thousand people trying to enrol to ~1 thousand different classes “all at once”. Each person is trying to enrol to ~30 classes out of that 1 thousand.
Could you please give me some hints how to test these scenarios, compare those variants, and especially reveal the real bottlenecks of each solution?
I know I could start fiddling with RabbitMQ/Kafka/GenStage or squeeze in ETS/Redis/Mnesia/Whatnot, but I’d be a headless chicken running here and there without knowing any real data. Now it’s time to understand and measure what my attempts so far can do.
This is becoming a “here be dragons” area for me, so and hints, guidance or mentorship is highly welcome, please. Happy to add you as collaborator to my repo, if you wish. (btw. should I read any particular book on this?)
Thank you.