GenServer Memory: Best practices for large internal state accumulation

henrysdev · May 27, 2021, 2:44pm

I have a multiplayer game backend that hosts many simultaneous game rooms as GenServers under a dynamic supervisor. Over the course of a game, one of the responsibilities of a game room GenServer is to track/accumulate the state of the game from beginning to end. After the game ends, the state is written to the database for tracking purposes (recording the result of each round, who won, etc etc).

As I add more and more intricate game state to be tracked, I’m starting to wonder about the scalability of such an approach.

Is there a limit on how large the internal state of a GenServer can be? Is there a good way to benchmark/graph the memory usage for a single GenServer using observer or something else? Will GenServer performance suffer due to having a huge internal state map (assuming immutable copies need to be made)?

Alternatively, I could persist and flush the game state data as I collect it rather than do one big series of DB writes at the end. However, the data model is hierarchical in nature so this would be a bit wonky. I’ve also considered storing the historical game state in an Agent or ETS temporarily before game end - would love any insight/second opinions!

lud · May 27, 2021, 4:01pm

Hi,

First you have to define “large” Are we talking about kilobytes or megabytes?

Do you have a recovery solution for your sate if the GenServer crashes? Because that would be part of the response: If you need to recover your state from any point of the game, then you have to write it to a database, so you will have to persist your state every N seconds.

An Agent is a GenServer, its state is stored in memory just like your game server so that would not solve your scalability problem.

ETS tables are also stored in memory, and there is a limit of the number of tables (~1400 by default) (that may have changed). It will allow to store your state while your GenServer crashes/recovers, but I would not create one table per game to store an history.

If your history is a kind of a list, then it would maybe fit in a PostgreSQL table. You can just store a bunch of JSON, binary JSON or even a serialized version of your state (:erlang.term_to_binary) in a single column. One table for the games, and one table for the history items. There are some noSQL databases, there is also mnesia. If you don’t write after all operations the performances should not be impacted much.

henrysdev · May 27, 2021, 4:32pm

Ah yes - let me give more context

Games are pretty ephemeral; they are tracked entirely in memory during gameplay - they only last like 5 minutes and are not high stakes. I’m not worried about recovering a game in progress if it crashes.
A game consists of multiple rounds (between 3-10 rounds)
A game room has between 2-10 players in it, but I’m hoping to have many more at some point…
Every round, every player in the game submits a json blob payload to the server with a max size of ~4kb. This is where significant mem usage would accumulate over time in storing the history. I ultimately store these blobs in a json column in Postgres.

While I do know that Agents and ETS are also just in-memory storage solutions, I’m wondering if it’d be advantageous to split up the state between multiple processes vs just the one genserver.

APB9785 · May 27, 2021, 4:40pm

So, you’re using one GenServer per game, which might go from ~4kb to ~400kb over ~5mins, and then at the end of the game, the GenServer saves some of the data to the database, and kills itself? Is that right?

Sounds perfectly reasonable to me.

henrysdev · May 27, 2021, 4:54pm

Thanks for the napkin math

Yes that’s correct. I take it 400kb (or even as much as ~1-2mb to be ultra conservative) isn’t an issue for one GenServer in terms of performance?

APB9785 · May 27, 2021, 7:47pm

I just started up a GenServer to hold a list of Strings, and added ~1mil characters at a time until I got to several hundred MB memory usage. No noticeable slowdown, neither with inserting new data, nor retrieving the data. Obviously this is not a very robust test, but it’s enough for me to think that if there is a performance issue, it would probably be from an inefficient update algorithm, or insufficient hardware, rather than a limitation of GenServer itself.