I’m currently working on a Phoenix Pub/Sub message bus to replace a union of Google Pub/Sub + an overloaded C# service. The current implementation uses a raw websocket to broadcast messages rather than any sort of Channel abstraction on top of it.
The Phoenix-based message bus manages websocket connections made from clients where we send information about what needs to happen in our game. In our tests, the message bus performs MUCH more efficiently than our current implementation does, but we’re running into weird situations described below.
The existing implementation doesn’t appear to have random disconnections (but we’re also not using any channel abstractions on top, either).
The issue we’re seeing:
We’re seeing situations where a client has “disconnected” from the topic they were subscribed to (without sending a
phx_leave event). The channel PID exits are monitored and a
disconnect message is broadcast to the user’s topic to terminate the connection. This appears to work when I test it locally, but the client application doesn’t appear to know the connection has been terminated.
The websocket timeout is set very high, so a timeout doesn’t appear to be the case at least on the server side.
The cluster is running on top of GKE using
libcluster. The websocket connections are going over an HTTP/S load balancer to our backend services. Connections between the nodes appear to be stable.
My next attempt is going to be sending a shutdown message to the transport PID (rather than using a broadcast) to ensure the socket is actually closed.
Has anyone experienced random channel disconnections without socket terminations?
What is the best practice for terminating a socket connection when we know the user has “unsubscribed” from the channel?
Is there a channel timeout separate from the websocket timeout that I’m missing?
Happy to answer any questions! Thank you all in advance for the help!