Is it cheaper to broadcast a small amount of data (over a Phoenix Channel) frequently or to gather it and broadcast the total fewer times?

I built a simple (and unauthoritative) game server that’s remarkably similar to a chat server. It’s a message bus that receives updates from users connected to a certain topic in a (Phoenix) Channel regarding their position, rotation, etc.

Each socket passes the update as a message to a process named UpdateDispatcher. The process runs a function, dispatch(), in an infinite loop. Every @tickrate times a second, it gathers all messages in its inbox, organizes them into a Map, and calls Endpoint.broadcast!/3 to send it to the users.

# ------------------------
# Module: UpdateDispatcher

def start_link() do
  pid = spawn_link(&dispatch/0)
  true = Process.register(pid, __MODULE__)
  {:ok, pid}
end

def dispatch() do
  {micros_elapsed, _} =
    :timer.tc(fn ->
      messages = receive_messages()

      if messages != %{} do
        Endpoint.broadcast!("world:lobby", "world_updates", messages)
      end
    end)

  max(sleep_duration() - micros_elapsed / 1000, 0)
  |> round()
  |> Process.sleep()

  dispatch()
end

defp sleep_duration(), do: (1000 / @tickrate) |> round()

# --------------------
# Module: WorldChannel

@impl true
def handle_in("user_update", %{"user_id" => user_id, "data" => data}, socket) do
  send(UpdateDispatcher, {:user_update, user_id, data})
  {:noreply, socket}
end

Currently, @tickrate is 10, so the dispatcher sends updates every 100 ms. The users send in theirs at the same rate, though they may not be synced up.

I built it that way out of fear of quadratic time complexity, since every user would otherwise broadcast their updates to all other users. That may work for a chat app, I thought, but my updates will be coming in way more frequently, so it won’t work for me. The idea came from this answer to a question I asked on Reddit.

However, is that really true? I made the assumption without knowing much at all about Elixir and Erlang. My method should, theoretically, decrease it to linear time, but it still sends roughly the same amount of data on the wire, so does it make any difference, or is the Erlang VM perfectly capable of handling the load?

Did I even need to bother, or could I simplify the application by ridding it of UpdateDispatcher and using Phoenix.Channel.broadcast_from!/3 inside the handle_in("user_update", ...)?

How could I even benchmark it? My goal is to reduce the server costs.

2 Likes

Some thoughts in no particular order:


On overload:

PubSub.broadcast! ultimately uses Registry.dispatch and send to deliver the message to each local recipient, semi-synchronously (“semi” because all the messages have to be sent before returning, but not necessarily handled).

send is an inexpensive operation, but it isn’t free - so there’s a hard upper limit on how many clients a single-process UpdateDispatcher can support on one node. If it takes more than 100ms to do all the sends, then the whole system will slip farther and farther behind.


On lag:

Buffering messages for (up to) 100ms means that a player of the game will see an effective “lag” 50ms higher than their connection’s latency on average. That would be frustrating for a fast-moving game like an FPS.

On the other hand, collecting all the messages every 100ms means that a laggy client might not get an update in during the window at all, while clients whose clocks are running fast might send two updates. Every listener of user_update is going to need to grapple with missing or extra values in data.

One way to deal with that situation is to collect responses differently, if the game’s semantics support it:

  • accumulate updates for either 100ms or until every user has checked in
  • “collapse” the updates by combining multiple updates from a single user and filling in “same as last time” for users that didn’t report during the interval

This would allow listeners to user_update to get consistent input.

4 Likes

Thank you for the insightful answer!

That’s an excellent point. It did cross my mind–which is why I clamped the sleep duration above 0–but I thought it might only happen in spikes. But you’re right: it’s true that, given enough connected users, the server may consistently fall behind. If I’m going to keep this code, I should test how many users that is.

I take it that’s a point in favor of not gathering messages and instead sharing them as they come? Doing everything from one process doesn’t scream, “Scalable!”

On the other hand, if I want to keep my current solution, perhaps I could route the updates through a manager process that forwards them to the most “free” dispatcher process and creates a new one when the existing processes are close to max capacity.

However, that’s more complexity. If per-update broadcasts are not an abuse of the system and are, in fact, how Phoenix Channels were intended to be used, that would simplify things a ton.

Don’t quote me on this, but I think that would be true for any game with a tick-rate of 10, regardless of its networking implementation. Valorant, for example–a cross of CS: GO and Overwatch–has a tick-rate of 144!

The app I’m building is not a game, but more akin to a space people can hang out at. There’s no competition, nor is it fast-paced, so I think an update every 100 ms is good enough. I tested it and it felt OK to me. If I change my mind, I can always increase @tickrate.

The way I tackled this is three-fold:

  1. If multiple updates of the same type come in from the same player in one “update window” (e.g. 2 position updates; but not 1 position and 1 rotation updates), the later updates overwrite the previous.
  2. Clients will only send updates of values that have changed. If a player is AFKing, they won’t send anything to the server.
  3. The server saves the updates it receives into a Map stored in an Agent (newer ones overwriting old data), but only broadcasts the updates of the players that sent one in during that window–the “deltas”–and not the whole Map. That way, the client will only execute the updates it has receives, leaving the unchanged alone. This ought to save both server costs and client-side computation time.

The reason I store the current state of the world on the server is so that newly-joined players can be given the latest snapshot of the world. Afterward, they’ll receive deltas alone, just like everyone else.

There’s no requirement that one PubSub message turns into one Channel message either - for instance, you could have the channel do the “accumulate messages with a timeout” thing and only send them to the client periodically. That helps with scaling since there’s a channel per client.