Handling Channels/LiveView LongPoll fallback "unmatched topic"

I’m investigating LiveView JS client errors when the Socket/LiveSocket throws failed with reason: {"reason":"unmatched topic"}.

I modified Phoenix 1.8.1 to emit telemetry events when the “unmatched topic” condition happens server side and collected some of those events (including socket and reply) with production traffic for debugging.

It turns out a batch of such errors all were related to the :longpoll transport and I’m trying to understand what the underlying cause might be, and whether I can safely ignore those errors assuming the channel connection will be re-established anyway later.

I have a proxy (Cloudflare) between users and the server.

This is one of the events, PII scrubbed:

%{
  name: [:phoenix, :socket_unmatched_topic],
  metadata: %{
    reply: %Phoenix.Socket.Reply{
      topic: "lv:phx-GGJYSitVRluvhDWD",
      status: :error,
      payload: %{reason: "unmatched topic"},
      ref: "92",
      join_ref: nil
    },
    socket: %Phoenix.Socket{
      assigns: %{},
      channel: nil,
      channel_pid: nil,
      endpoint: MyAppWeb.Endpoint,
      handler: Phoenix.LiveView.Socket,
      id: nil,
      joined: false,
      join_ref: nil,
      private: %{
        connect_info: %{
          x_headers: [
            # ...
          ]
        }
      },
      pubsub_server: MyApp.PubSub,
      ref: nil,
      serializer: Phoenix.Socket.V2.JSONSerializer,
      topic: nil,
      transport: :longpoll,
      transport_pid: #PID<0.33718.0>
    }
  },
  measurements: %{system_time: 1757064975931223300}
}

Has anybody explored this area before and could help shed some light?
What additional data could I gather?

My current thought is that the client either misses a ping/heartbeat or the proxy terminates a connection earlier, causing a server process backing the channel connection to go away. Probably the client reconnects spawning a new process / "lv:..." topic and the user doesn’t even notice :crossed_fingers:

It would help me to reproduce this locally, so any tips / shortcuts are welcome :slight_smile:

References collected:

In order to preserve the state of the user’s connected socket and to preserve the behaviour of a socket being long-lived, the user’s process is kept alive, and each long-poll request attempts to find the user’s stateful process. If the stateful process is not reachable, every request will create a new process and a new state, thereby breaking the fact that the socket is long-lived and stateful.

Clients subscribe to topics, and Phoenix stores those subscriptions in an in-memory ETS table. If a channel crashes, the clients will need to reconnect to the topics they had previously subscribed to. Fortunately, the Phoenix JavaScript client knows how to do this. The server will notify all the clients of the crash. This will trigger each client’s Channel.onError callback. The clients will attempt to reconnect to the server using an exponential backoff strategy. Once they reconnect, they’ll attempt to rejoin the topics they had previously subscribed to. If they are successful, they’ll start receiving messages from those topics as before.

For some reason, I’m observing cases in which those topics are gone when the client sends a long poll request.

I was able to reproduce. Will investigate!

Should be fixed by treat 410 as error if we already have a token by SteffenDE · Pull Request #6538 · phoenixframework/phoenix · GitHub!

Thanks @steffend :purple_heart:

The condition seems very plausible to correlate with what we were seeing. I’ll update and hopefully this class of errors will be gone :slight_smile:

I’ll report otherwise! Thanks very much!

Thank you for reporting. That issue was in the code basically since forever, so it’s interesting that this doesn’t seem to have happened very frequently in the past. I hope we’ll get a new Phoenix version out this week!

I guess because LongPoll was not enabled by default it didn’t get as much use?

I’m monitoring WS x LP connections and trying to make sense why the LP connections are happening (kind of a losing game at scale, but interesting enough anyway).

I guess because LongPoll was not enabled by default it didn’t get as much use?

Possible, yeah. Although it’s been active by default for new projects for a while now (Add longpoll fallback and make LP enabled by default (#5688) · phoenixframework/phoenix@0cae42c · GitHub) :smiley:

For cross-referencing, a follow up issue regarding the same error now focused on WebSocket connections: Client side `failed with reason: {"reason":"unmatched topic"}` errors (WebSocket) · Issue #4075 · phoenixframework/phoenix_live_view · GitHub

Closing the loop in case anyone finds this in the future:

The “unmatched topic” error is a race condition that can happen when the client tries to push an event to a topic the server doesn’t know about (e.g. because the LiveView process is gone).

The pushEvent JS hook method has two possible signatures: pushEvent(name, payload, onReply): void and pushEvent(name, payload): Promise, the former silently ignores errors while the later rejects the promise instead.

I was using the promise-based version for a fire-and-forget event and needed to properly ignore the event on my application’s side.

The LiveView docs has been updated to document the behavior of pushEvent and pushEventTo.

Thanks @steffend for your help!