If Epic Games used Elixir, could they handle the traffic?

KekKekington · May 14, 2020, 4:25pm

Earlier today Epic Games released GTA V for free as an offer and their website almost immediately crashed. I’m still new to the world of Elixir and OTP so it got me wondering: if they used Elixir for their store backend, could they’ve handled the traffic? Of course it could also depend on how their OT scales but I’m curious about what more experienced Elixers have to say about this.

Qqwy · May 14, 2020, 5:23pm

I don’t remember names from the top of my head, but I do remember that there are some game studios that indeed use Elixir or Erlang to handle their matchmaking, as well as a company that does in-game analytics using Elixir for the backend.

ksthiele · May 14, 2020, 5:48pm

according to stackshare.io they are using elixir, but it doesn’t go to much into detail where and how they are using it. https://stackshare.io/epic-games/epic-games

KekKekington · May 14, 2020, 5:53pm

I saw that but didn’t really find anything besides that claim. They have some writeups regarding their use of AWS but they never mention Elixir.

KekKekington · May 14, 2020, 5:55pm

Seems like Square Enix and Riot Games use it.

wanton7 · May 14, 2020, 7:08pm

According to this they use JVM for their store backend https://gamejobs.co/Senior-Backend-Engineer-at-Epic-Games

Nicd · May 14, 2020, 7:36pm

The answer is, we cannot say. Elixir is not a silver bullet. For example Elixir in itself cannot prevent the database or caches from being overloaded. Sure it provides you with tools to handle concurrency easier than some other technologies but to really scale you need to put in work and most often your database ends up being the bottleneck anyway.

Qqwy · May 14, 2020, 8:58pm

Magnus Henoch and Ramón Lastres gave a presentation at the virtual BEAM Meetup of 2020-04-23 about how Elixir is used for their company ‘GameAnalytics’. The recorded video of the meetup can be found here.

hauleth · May 15, 2020, 12:15am

No

It was a pleasure helping you

KekKekington · May 15, 2020, 12:54am

As a beginner you get bombarded with all these claims of redundancy, thousands of processes, etc. but it is important to keep in mind that anything can happen in the real world. Thanks for explaining.

Just being curious, if you don’t mind, how would you “solve” the bottleneck?

Nicd · May 15, 2020, 4:45am

For the DB, I don’t really know as I’ve never dealt with that scale. All my problems have been solvable with caching, better queries, and bigger hardware.

mudasobwa · May 15, 2020, 4:56am

I could not disagree more. This answer is extremely not accurate semantically.

could they’ve handled the traffic?

Yes

would they’ve handled the traffic?

No ^{Here I tend to agree.}

hauleth · May 15, 2020, 7:56am

I mean that the Elixir would not change their ability to handle such traffic.

lucaong · May 15, 2020, 8:17am

Exactly this. The ability to “handle the traffic” depends very weekly on the language, and much more heavily on infrastructure, budget, etc.

As a result, questions like the one in this post are impossible to answer meaningfully.

In other words, my answer would be yes, they could have handled it, but no, they probably would not have handled it just by switching to Elixir with no other change. And then again, my answer would be the same for basically any language.

wanton7 · May 15, 2020, 9:15am

To add we don’t know why website crashed. If it was because there was a multi-threaded data race then Elixir would have helped because it’s impossible to have those in language where all data is immutable, like in every BEAM language. If it was because requests to backend timed out then language runtime that splits time between light weight processes/green threads more equally might help keep server responsive longer, like Go (especially new 1.14 that has preemptive scheduler) or anything running on BEAM like Elixir. But like Nicd wrote bottleneck is usually database or just overall design of the backend.

More info about new Go:lang’s preemptive scheduler https://medium.com/a-journey-with-go/go-asynchronous-preemption-b5194227371c

aenglisc · May 15, 2020, 10:08am

Without knowing exactly why that happened, it’s anybody’s guess. It is however fairly unlikely that their language of choice is the key culprit (if at all one).

dimitarvp · May 16, 2020, 7:09am

Elixir, by its nature of living inside the BEAM VM, would be able to handle traffic close to what a machine’s network could handle before buckling, all other things being optimal (so, not being bogged down by connection limits to a database). And since the runtime (the BEAM) does its best to give fair treatment of all processes – in this case network requests – I believe that yes, their server(s) could have handled more load.

Many other tech stack just start timing out when faced with too much traffic. Back when working with Ruby on Rails with the Puma server, the scenario of having a pool of 50 workers but happened to get a lag spike of 1000 users almost at the same time was met with “oh well then, bad luck” and a shrug from the programmers and the business owners. A lot of tech has been invented that implements pooling – you get a request, put it for processing in one worker in the pool, and hope that it won’t take 30 secs before the next waiting network requests time out from their client app. It’s basically how most of the world’s public-facing services work. Welcome to the glorious world of programming!

I never did work on something in Elixir that made a server buckle under load and I know Elixir is not magic and would still crash eventually, of course (there are physical limits after all). But from what I’ve observed during my work with Elixir, many times, is that it performs with much less lag during a stressful situation with a lot of load on the server. This alone would likely prevent most timeouts and the dreaded “service currently unavailable” on release days.

…All of that won’t help one bit if the server needs to open 50_000 connections to a DB server however.

As others said, we simply don’t know the circumstances.

I did once scale an app to 5 servers with independent caches (local / per-node) and that seriously improved the latency when that app was having bursts of traffic. But if your app relies on a limited external resource then the language of choice matters very little.

If Epic Games used Elixir, could they handle the traffic?

No

Yes

No Here I tend to agree.

No ^{Here I tend to agree.}