Question about Presence metadata limitation?

peppy · October 24, 2021, 12:53am

I was reading the documentation about Presence metadata and came across this bit: Phoenix.Presence — Phoenix v1.6.2

"Presence metadata should be minimized and used to store small, ephemeral state, such as a user’s “online” or “away” status. More detailed information, such as user details that need to be fetched from the database"

Question: Why is it recommended that presence metadata be minimized? Does too much metadata slow down the system or something?

The reason I’m asking is because I had a nice set up where when a user joins a chat room, I run this script:

def handle_info(:after_join, socket) do
    {:ok, _} = Presence.track(socket, "#{socket.assigns.user_id}", %{
      online_at: inspect(System.system_time(:second)),
      status: socket.assigns.status,
      profilephoto: socket.assigns.photo,
      username: socket.assigns.username,
      gender: socket.assigns.gender
    })
    push(socket, "presence_state", Presence.list(socket))
    {:noreply, socket}
  end

It tracks the user to the Presence setup for the room, so users can see them online. At the same time, this takes the data assigned set on the user’s socket, and attaches it to the Presence, passing along a few basic snippets such as the user’s profile image, username, etc, so that a user’s basic stuff renders nicely on the client side, as a user joining a chat room.

Is this the incorrect way of doing this? In the documentation, it recommends using “fetch” to fetch user details from the database instead of storing it in the Presence. Seems like it would be much easier copying the existing data from a socket and putting it in the Presence?

Or are they recommending using client-side AJAX or something to fetch each user’s details after they appear in the room?

kip · October 24, 2021, 4:24am

Phoenix Presence is implemented using a state-based delta CRDT.

CRDT’s provide a mechanism for conflict-free replicated data (hence the name). But there is no totally free lunch - and in this case the performance deteriorates with large deltas. Both in calculating the deltas and in shipping them to all nodes in a cluster. Therefore keeping the presence state small - as you noted in the documentation - is highly recommended.

The state you are pushing in your example doesn’t meet this requirement, and all of it can be derived external to the presence state except :online_at

peppy · October 24, 2021, 10:48pm

Thanks for the details and scientific abstract. I understand now, so I’ll need to try reducing that presence data try optimizing my app for the long term.

I saw in the documentation that there is a “fetch” function for retrieving the user details from the database as a separate process when presence.list is called. However, since all the user data is already stored in each socket as “socket.assigns”, I was hoping there was some simple way to copy that data from the sockets into the presence.list, and prevent that redundant call to the database, and keep the presence data minimized at the same time.

Is there a way to do this?

dom · October 25, 2021, 1:27am

Each channel is a separate process, and Erlang won’t let you access memory owned by another process directly, for a few reasons e.g. avoid concurrency bugs, support distribution across multiple nodes, avoid global GC pauses.

You could ask each process via message/reply but it won’t scale very well. If DB load is a problem caching libraries (con_cache, cachex) could be a good fit.

LostKobrakai · October 25, 2021, 8:20am

There’s one additional consideration: Metadata is not merged. If a user is present in 4 places (e.g. 2 tabs + mobile and tablet) then you have one “presence”, but a metadata list of 4 metadata maps. You don’t want to keep data within the presence meta, which aren’t going to be different between the places a user is connected from and therefore essentially are duplicates. online_at and maybe status would be candidates to keep in presence by that logic. The rest are probably best kept outside of it.

This is one of the places where the nomenclature of phoenix is a bit tricky. Yes it’s called socket in all the places, but there are actually different sets of data. The websocket connection handling process creates and holds the initial version of socket, which is then copied into each individual joined channel process and modified individually from there. So what you call socket actually doesn’t refer to one consolidated set of data, but actually a quite fractured version.