Channels for game development - throughput per second?

codymartin2005 · March 6, 2021, 1:33am

Hello all,

Over the past several months I have been working on a side project social interaction VR game. I currently use phoenix channels and presence for friends list management and basic room notifications such as join/leave.

Player movement is currently being directed over a C# socket based solution. I’ve been debating on trying to move player movement into phoenix channel to shrink my code base, but haven’t found much information on how much data channels can handle before they start getting behind.

Currently a player may send a max of 60 updates in one second. These updates are binary string and are typically about 30 characters in length. On average there would be 15-20 players connected to a topic. These updates would need to be broadcast to every other player in the room.

Has anyone used channels this way? Does it seem possible or does a pure socket based implementation sound better overall?

Thanks,
Cody

homanchou · March 7, 2021, 2:59am

I’m currently doing something similar. VR (WebXR) in the front end is done in the browser using Babylon.js. I use phoenix channels for players to connect to a common room id. Headset and controller position and rotation data are collected per frame using rxjs observables but then throttled and merged into a single combined payload no more frequent than every 100 ms, so roughly 10 msgs per second. The channel broadcasts these movements to the room channel which in turn updates a named GenServer (also named after the room id), whose state contains all of the players positions and rotations.

The idea behind the GenServer is that players can create their own private rooms, like zoom meeting rooms. The GenServer updates its internal state asynchronously as it continuously receives individual player movements. It then broadcasts all player positions and rotations every 100ms to all players using phoenix pubsub back to the channel in javascript land. This is so we send one large payload every 100ms rather than a bunch of small payloads all the time.

The result is currently somewhat minorly noticeable choppy animation, but I can just fix that later by blending keyframes locally. I’ve tested this setup with only my friends with about 8 people and it seems to work fine with one room at least.

But like you, I don’t know where the tipping point is. How many rooms can I support, how many messages per second can be handled per room or per machine for that matter? I guess this depends on how beefy the machine is. I’m not sure how scaling up to more servers helps me scale a room, because in my design the room is a GenServer and a single point of failure. Even in a multi-server setup, the websocket connections could be on different machines but their messages would need to forward to the same Room GenServer on one of the machines.

I read briefly about phoenix presence, but as I understand it does a ton of data replication, which I thought wasn’t suitable for the firehose of movements information I would be throwing at it. So I just use a single GenServer to hold attendance and state in a room. I might be wrong about this design (any feedback is welcome).

I haven’t even tried optimizing the messages, I’m sending standard JSON that comes out of the box with phoenix and don’t know how to use binary serialization.

Voice and video chat are done with a 3rd party javascript library. I’m on the free tier, but eventually at scale would not be free. What do you use?

Would love to see what you’re working on. DM me, maybe we can share thoughts or collaborate.

Homan

amnu3387 · March 7, 2021, 7:38am

I’m not an expert but some thoughts:

By what I understand of gaming networking there’s one issue that you’ll always have by using a TCP based transport, like websockets, in that it’s an ordered/synchronous transport. Sender and receiver require the message to be assembled in order and re-sending of dropped packets and such. That’s why many games use udp for transmitting changes in the world, and makes sense, because if you waste time guaranteeing the message is fully passed when it’s done it might (will be) already be outdated, so it’s better to instead be working on the news ones that will have arrived in the meanwhile.

I don’t think you can solve this with websockets but you can attenuate it I imagine if:

You use a transport that requires less encoding/serialization (time) and produces smaller payloads (size) this diminishes the latencies because less data and time spent on marshalling it, and also if the messages are smaller then less chance of dropped packets per msg
Using the smallest relevant unit of info you can - if still using json then trying to minimise all keys and fields - again this in terms of TCP help on the fragmentation of the packets, although it probably increases serialization overall since it has to serialise more smaller messages (which can be helped with a better transport)
Reducing the contention point on dispatching updates (remove synchronisation of the sending of updates)

The last 2 points are relevant for this:

It then broadcasts all player positions and rotations every 100ms to all players using phoenix pubsub back to the channel in javascript land. This is so we send one large payload every 100ms rather than a bunch of small payloads all the time.

Your animation will never be more fluid than 10 frames per second, even taking out all latencies and packet dropping, although you can help by, like you said, using frame blending. I imagine that ideally you want to send updates as soon as you’re able to process them, in the smallest unit you want (I don’t know how this fares in practical terms using json, but theoretically it would be the right thing), specially because you’re already delaying the client sending the information.

To actually test this I think the best would be to deploy a version somewhere online, write some modules that create websockets and send msgs in given intervals, and time the rate and type of received messages from the server - start from a low number increasing the throughput of the spam/clients as you go. You might want to add something also server side to see how the gen_servers fare there, the least obtrusive would be to have some counters that just gets updated on each received msg and dispatched message, and when you send the “last” message just answer with their stats.

For the connections on same machine the room (genserver) is running, there isn’t probably an easy switch you can turn on. I remember (and probably still is) that when joining game servers you would pick the server you wanted, most games would give you a ping/latency reading plus number of players, type, etc. If you have somewhat static servers (or pools of servers) then subdomaining them could work, as in, the room would hold information about which subdomain points to the server it’s being hosted and when opening a socket you would use that. Other than that and for highly dynamic servers I think you would need to implement those things with help of proxies/load balancing, where perhaps your client says “I want to join this room”, and this goes to your cluster where the host that receives the request answers with information so that then the client can make a request that the lb/proxy can use to know where to connect.

codymartin2005 · March 7, 2021, 3:23pm

So great to see two very well put together comments on this!

Homan, it sounds like you and I have very similar projects-the major difference being the front end presentation. I also decided to use one GenServer per room for a few small pieces of state information, though I don’t store each player location (on a new player joining, the currently players broadcast their info packet to use as a starting point). I recently began incorporating presence to track active users, statuses, but do have the same concern with replication between all nodes.

My goal is to send no more than 25 location packets per seconds. The library I’m using does have some extrapolation built into it that does fairly well at filling in those gaps for smooth movement. I like your idea of batching the updates that go back to players and will have to consider that as well.

I currently send a CSV string to my socket server that has a fixed format to movement. I plan on looking at MessagePack if I do go the Channel route. My data package contains a Base64 of the position/rotation information along with an id for the network character- a couple of ideas copied from Unity’s legacy network layer.

For voice and video I’m utilizing WebRTC through a gateway (Janus). Channels currently handle the messaging required to start up the connection for each rooms audio/video component… I’ve used Janus with large loads previously and have had great results at low cost, though it does have a lot of drawbacks in terms of high availability.

Always good to see someone with similar interests! I post progress on twitter @v_realms occasionally.

@amnu3387 has a great point on UDP vs TCP in game design. My currently socket server is UDP which greatly affected the throughput of messages, however I also did have to do a lot of extra coding handling the “what if this packet doesn’t arrive” situation. I do have some reservations switching to a TCP backed system, but if I can get near the same performance, loosen up some of the “safety code”, and start distributing more it may be worth it.

I think my plan of action is going to be to switch over the movement code and do some actual testing on multiple servers in different locations to see if it can handle what I need. Once I get my baseline with a handful of users, I’ll kick up a couple of test clients to generate a bigger load and see how everything performs.

chrismccord · March 7, 2021, 3:40pm

Phoenix channels now supports binary encoding, so you could come up with your own binary data format and send that to save bits and CPU time on the server.

homanchou · March 7, 2021, 6:34pm

This is great news. I have yet to find a good blog post example of how to do this from end-to-end. Some of the blog posts just have the Part 1, and I’m still waiting for the Part 2 , so I’ve been putting this off as an optimization that can come later.

homanchou · March 7, 2021, 6:59pm

Yes I came to similar conclusions in my research. UDP would be preferred since for most games it’s more important to have the latest information rather than to have all the correct information in order. However, I’ve found no easy way to send UDP from a browser, which is my target platform. I think webRTC might be able to do it from peer to peer, but I’m using a commercial webRTC product that does not support data in this current version.

In my research looking at some other game networking solutions I saw that they did decide on buffering data to prefer larger payloads that are slightly less frequent then very frequent tiny payloads due to the overhead of each message. I guess this type of thing just depends on the serialization envelope as well as exactly how frequently we’re talking about and needs to be measured as each project’s mileage may vary depending on what they are doing.

I would consider binary serialization over json, but for my MVP I’m finding myself having a lot of work to do in other areas and can always come back to this as an optimization.

I can maybe also do optimizations such as, only send frequent updates to those clients that are currently looking at other players (appearing within the camera’s frustum), otherwise only send more periodic updates. No sense in sending large amounts of updates if they are behind the person’s back. Of course this comes with it’s own challenges and complexities…