Hello,
I’ve recently developed a module that creates distributed user counters, based on what’s described in this post
Basically, locally I use ETS table to atomically know how many people there is in a channel, then I aggregate all counters of all nodes using Phoenix Tracker.
In summary, users starts by joining “room:lobby”, using ETS table, I check what’s the last room id with available slots( <= 70), and if I can’t find one, I assign them to a new topic.
Those topics (“room#{id}”) are created in ascending order (“room#1”, “room#2”, …), and I use the same topics on every nodes.
With that in place I’m able to create channel topics with a max size of 70 or so users.
It works quite well overall, but I have a “problem” which concern the total number of users on channel topics.
As I said previously, the first step before tracking counters across nodes, is to use ETS table locally, to know the current count on topic, so obviously the value I get only concern users who join on that node.
Therefore if I deploy my app on 10 nodes, since the max is 70, I will get 700 users join on every single topic.
It’s not good for me because I don’t want those rooms to be too crowded. (I plan on using Phoenix Presences on those channels, which does not scale very well when there’s too many users)
1) First solution:
To solve that, my first thought was to divide the max, by the number of actives nodes:
For example on 10 nodes, max => 7 users
That means locally 7 users max can join a channel topic.
This works but it’s not perfect either because the users will be load balanced to different nodes.
They might fill room topics on certain nodes faster than on others, so it would create new topics with barely any users while the previous rooms are not filled completely yet.
2) Second solution
To avoid that problem, I could broadcast to every nodes when a specific room is full or since room id are created in ascending order, I could regularly inform the last id with available slots
This would occur in the room tracker after the aggregate.
The only issue is that, there’s always a delay between the moment where users are populated to a specific channel and the aggregate. That information would always arrive too late…
At this point, I\m running out of ideas