Phoenix Sockets getting stuck at "closed", "connecting" or "disconnecting" state

I am using Phoenix with Apollo and @absinthe/socket-apollo-link to listen to GraphQL subscriptions. The users spend a lot of time on the app with tab open, without refreshing the page, often they have the app open for weeks.

Disconnects happen during that time, for various reasons. When we deploy new version is one case, but users’ devices go to sleep, they disconnect from wifi, or their network is having issues.

We have been getting reports, since about updating to Phoenix 1.6.4 on the page being “stuck”. I managed to reproduce it by just leaving the page open for 24 hours too, several WebSocket disconnection happened and, when I went to my computer the phoenixSocket.isConnected() was false and phoenixSocket.connectionStatus() was connecting.

Since then, I was able to reproduce the issue, fairly reliably by doing the following, with two hosts on my local network.

  1. I launch the app on Host A, on port 4000.
  2. From Host B, I open the SSH connection and port forwarding to Host A. So that port 4000 on Host B is forwarding TCP connection to Host B on port 4000.
  3. On Host B, I load the page over at localhost:4000. This goes to local forwarded port, hits Host A, renders page, connections establish properly.
  4. I disconnect wifi on Host B for 15 seconds or so. This is short enough so that SSH session doesn’t block, but the Phoenix socket disconnects.
  5. I turn on the wifi on Host B.
  6. I can see the page in Chrome on Host B being stuck. Connection is usually “closed”, but calling photnixSocket.connect() has no effect.

I can re-set the connection by first calling disconnect() and then immediately connect().

I suspect this may be an issue in Phoenix.js or somewhere in my code, but I wonder if someone else had similar issues?

1 Like

Brave browser by chance or all browsers you try?

Hey Chris, no, Chrome 96.

I suspect we had a websocket issue on production that since was resolved, and at the same time we did have the LongPoll fallback misconfigured. I bumped down the heartbeat from 30s to 5s, enabled LongPoll and I also monitor the socket this way every 5s:

function checkSocketWorks(phoenixSocket) {
  if (phoenixSocket.isConnected() == false) {
    connDropped = true;
    /* Phoenix.js seems to hang in 'disconnected', 'connecting', or 'disconnecting' states
     * when the network is unreliable but not totally down. In such case isConnected()
     * correctly returns false, but you cannot reconnect the connection before dropping it
     * manually, something disconnec() does */
    phoenixSocket.disconnect();
    phoenixSocket.connect();
  }
}

The above changes seem to have fixed it for us.

Do you want me to prepare either a video / isolated code example to reproduce the issue? I suspect it may be some race condition in the socket JS implementation that we’re hitting.

1 Like