Hello Alchemist,
While this doesn’t have much to do with elixir, i hope you will allow me to feast a bit on your knowledge
I’m currently working on a statistics engine, which process game files to generates and persist a bunch of stats for that game.
For each stat entry i save a lot of meta data for that specific stat (unstructured data), which is why i thought that Postgres could be a good solution, as i can rely on their jsonb format for the meta data.
This is what the structure looks like now
Column | Type | Collation | Nullable |
-----------------+-------------------+-----------+----------+
id | bigint | | not null |
value | double precision | | not null |
stat_identifier | character varying | | not null |
player_id | character varying | | not null |
user_id | bigint | | |
team_id | bigint | | |
ghost_id | bigint | | |
stat_meta_id | bigint | | not null |
meta | jsonb | | |
The meta field could contain something like this
{"side": "CT", "time": 24.7734375, "round": 1, "stage": "de_inferno", "weapon": "P2000", "overtime": false, "was_kill": false, "hit_group": "chest", "victim_id": "76561198856413397", "was_headshot": false, "victim_position": {"x": 197.15223693847656, "y": 1380.577392578125, "z": 104.56356811523438}, "was_self_damage": false, "was_team_damage": false, "attacker_position": {"x": 418.1552429199219, "y": 2116.254638671875, "z": 131.56341552734375}}
The goal here is to allow for some advanced queries cross multiple matches. Like how many time do we get killed in overtime, on x map, in y location, by z weapon.
This is perfectly possible with this structure. My problem is scalability. An average game gives me around 12000 entries in that table, 500 games takes up 2GB of storage.
I’m likely to process 1000+ a day, exponentially.
I don’t have that much experience with big data (which is guess this categorize as?), what are my options here? Should i look elsewhere than postgres?
I used most of yesterday to do some research of how to scale such a solution.
The obvious solution of course is to limit the amount of stats put into the database which definitely a option, but last resort.
Cassandra could look like a solution, while is does not support unstructured data, empty fields does not take any space so i could just add the meta fields to the table.
However i find i hard to model for Cassandra as the philosophy is to build tables around filters/queries (ex. users_by_id), but the filters applied to my queries will vary and the combinations are many.
I could really use some a tips of direction, any help is greatly appreciated.
Thank you