Handling Channels/LiveView LongPoll fallback "unmatched topic"

I’m investigating LiveView JS client errors when the Socket/LiveSocket throws failed with reason: {"reason":"unmatched topic"}.

I modified Phoenix 1.8.1 to emit telemetry events when the “unmatched topic” condition happens server side and collected some of those events (including socket and reply) with production traffic for debugging.

It turns out a batch of such errors all were related to the :longpoll transport and I’m trying to understand what the underlying cause might be, and whether I can safely ignore those errors assuming the channel connection will be re-established anyway later.

I have a proxy (Cloudflare) between users and the server.

This is one of the events, PII scrubbed:

%{
  name: [:phoenix, :socket_unmatched_topic],
  metadata: %{
    reply: %Phoenix.Socket.Reply{
      topic: "lv:phx-GGJYSitVRluvhDWD",
      status: :error,
      payload: %{reason: "unmatched topic"},
      ref: "92",
      join_ref: nil
    },
    socket: %Phoenix.Socket{
      assigns: %{},
      channel: nil,
      channel_pid: nil,
      endpoint: MyAppWeb.Endpoint,
      handler: Phoenix.LiveView.Socket,
      id: nil,
      joined: false,
      join_ref: nil,
      private: %{
        connect_info: %{
          x_headers: [
            # ...
          ]
        }
      },
      pubsub_server: MyApp.PubSub,
      ref: nil,
      serializer: Phoenix.Socket.V2.JSONSerializer,
      topic: nil,
      transport: :longpoll,
      transport_pid: #PID<0.33718.0>
    }
  },
  measurements: %{system_time: 1757064975931223300}
}

Has anybody explored this area before and could help shed some light?
What additional data could I gather?

My current thought is that the client either misses a ping/heartbeat or the proxy terminates a connection earlier, causing a server process backing the channel connection to go away. Probably the client reconnects spawning a new process / "lv:..." topic and the user doesn’t even notice :crossed_fingers:

It would help me to reproduce this locally, so any tips / shortcuts are welcome :slight_smile:

References collected:

In order to preserve the state of the user’s connected socket and to preserve the behaviour of a socket being long-lived, the user’s process is kept alive, and each long-poll request attempts to find the user’s stateful process. If the stateful process is not reachable, every request will create a new process and a new state, thereby breaking the fact that the socket is long-lived and stateful.

Clients subscribe to topics, and Phoenix stores those subscriptions in an in-memory ETS table. If a channel crashes, the clients will need to reconnect to the topics they had previously subscribed to. Fortunately, the Phoenix JavaScript client knows how to do this. The server will notify all the clients of the crash. This will trigger each client’s Channel.onError callback. The clients will attempt to reconnect to the server using an exponential backoff strategy. Once they reconnect, they’ll attempt to rejoin the topics they had previously subscribed to. If they are successful, they’ll start receiving messages from those topics as before.

For some reason, I’m observing cases in which those topics are gone when the client sends a long poll request.

I was able to reproduce. Will investigate!

1 Like

Should be fixed by treat 410 as error if we already have a token by SteffenDE · Pull Request #6538 · phoenixframework/phoenix · GitHub!

3 Likes

Thanks @steffend :purple_heart:

The condition seems very plausible to correlate with what we were seeing. I’ll update and hopefully this class of errors will be gone :slight_smile:

I’ll report otherwise! Thanks very much!

Thank you for reporting. That issue was in the code basically since forever, so it’s interesting that this doesn’t seem to have happened very frequently in the past. I hope we’ll get a new Phoenix version out this week!

1 Like

I guess because LongPoll was not enabled by default it didn’t get as much use?

I’m monitoring WS x LP connections and trying to make sense why the LP connections are happening (kind of a losing game at scale, but interesting enough anyway).

I guess because LongPoll was not enabled by default it didn’t get as much use?

Possible, yeah. Although it’s been active by default for new projects for a while now (Add longpoll fallback and make LP enabled by default (#5688) · phoenixframework/phoenix@0cae42c · GitHub) :smiley:

For cross-referencing, a follow up issue regarding the same error now focused on WebSocket connections: Client side `failed with reason: {"reason":"unmatched topic"}` errors (WebSocket) · Issue #4075 · phoenixframework/phoenix_live_view · GitHub