Earlier today Epic Games released GTA V for free as an offer and their website almost immediately crashed. I’m still new to the world of Elixir and OTP so it got me wondering: if they used Elixir for their store backend, could they’ve handled the traffic? Of course it could also depend on how their OT scales but I’m curious about what more experienced Elixers have to say about this.
I don’t remember names from the top of my head, but I do remember that there are some game studios that indeed use Elixir or Erlang to handle their matchmaking, as well as a company that does in-game analytics using Elixir for the backend.
I saw that but didn’t really find anything besides that claim. They have some writeups regarding their use of AWS but they never mention Elixir.
Seems like Square Enix and Riot Games use it.
According to this they use JVM for their store backend https://gamejobs.co/Senior-Backend-Engineer-at-Epic-Games
The answer is, we cannot say. Elixir is not a silver bullet. For example Elixir in itself cannot prevent the database or caches from being overloaded. Sure it provides you with tools to handle concurrency easier than some other technologies but to really scale you need to put in work and most often your database ends up being the bottleneck anyway.
Magnus Henoch and Ramón Lastres gave a presentation at the virtual BEAM Meetup of 2020-04-23 about how Elixir is used for their company ‘GameAnalytics’. The recorded video of the meetup can be found here.
It was a pleasure helping you
As a beginner you get bombarded with all these claims of redundancy, thousands of processes, etc. but it is important to keep in mind that anything can happen in the real world. Thanks for explaining.
Just being curious, if you don’t mind, how would you “solve” the bottleneck?
For the DB, I don’t really know as I’ve never dealt with that scale. All my problems have been solvable with caching, better queries, and bigger hardware.
I could not disagree more. This answer is extremely not accurate semantically.
could they’ve handled the traffic?
would they’ve handled the traffic?
No Here I tend to agree.
I mean that the Elixir would not change their ability to handle such traffic.
Exactly this. The ability to “handle the traffic” depends very weekly on the language, and much more heavily on infrastructure, budget, etc.
As a result, questions like the one in this post are impossible to answer meaningfully.
In other words, my answer would be yes, they could have handled it, but no, they probably would not have handled it just by switching to Elixir with no other change. And then again, my answer would be the same for basically any language.
To add we don’t know why website crashed. If it was because there was a multi-threaded data race then Elixir would have helped because it’s impossible to have those in language where all data is immutable, like in every BEAM language. If it was because requests to backend timed out then language runtime that splits time between light weight processes/green threads more equally might help keep server responsive longer, like Go (especially new 1.14 that has preemptive scheduler) or anything running on BEAM like Elixir. But like Nicd wrote bottleneck is usually database or just overall design of the backend.
More info about new Go:lang’s preemptive scheduler https://medium.com/a-journey-with-go/go-asynchronous-preemption-b5194227371c
Without knowing exactly why that happened, it’s anybody’s guess. It is however fairly unlikely that their language of choice is the key culprit (if at all one).
Elixir, by its nature of living inside the BEAM VM, would be able to handle traffic close to what a machine’s network could handle before buckling, all other things being optimal (so, not being bogged down by connection limits to a database). And since the runtime (the BEAM) does its best to give fair treatment of all processes – in this case network requests – I believe that yes, their server(s) could have handled more load.
Many other tech stack just start timing out when faced with too much traffic. Back when working with Ruby on Rails with the Puma server, the scenario of having a pool of 50 workers but happened to get a lag spike of 1000 users almost at the same time was met with “oh well then, bad luck” and a shrug from the programmers and the business owners. A lot of tech has been invented that implements pooling – you get a request, put it for processing in one worker in the pool, and hope that it won’t take 30 secs before the next waiting network requests time out from their client app. It’s basically how most of the world’s public-facing services work. Welcome to the glorious world of programming!
I never did work on something in Elixir that made a server buckle under load and I know Elixir is not magic and would still crash eventually, of course (there are physical limits after all). But from what I’ve observed during my work with Elixir, many times, is that it performs with much less lag during a stressful situation with a lot of load on the server. This alone would likely prevent most timeouts and the dreaded “service currently unavailable” on release days.
…All of that won’t help one bit if the server needs to open 50_000 connections to a DB server however.
As others said, we simply don’t know the circumstances.
I did once scale an app to 5 servers with independent caches (local / per-node) and that seriously improved the latency when that app was having bursts of traffic. But if your app relies on a limited external resource then the language of choice matters very little.