Keeping track of WebSocket disconnects

I have a Channels application (WS only) whose clients are phone apps whose source code I control.

The application logs WS connections in the connect/3 callback, but as we know, there is no disconnect callback. I have done some searches and looks like being notified of a disconnection is not trivial (please let me know if I missed any robust solution).

However, in my local tests it seems that a channel’s terminate/2 is called when the socket is closed just fine. I guess that’s possible due to the process structure. Would it be a bad idea to track disconnections this way?

For example, a simple approach could be the following: First thing after connecting, phones would be required to join a dedicated connected channel whose only purpose is to log on join/3 and on terminate/2. Assuming the client code is correct, the join is correctly programmed, and no manual unjoin is coded, do you see drawbacks or better ways to solve this?

1 Like

Ah, of course, let me add that if between a connection and the join you precisely lost the signal, the phone will disconnect and the app won’t register anything. This edge case is fine.

Given phoenix already comes with a sophisticated solution for tracking presences (Phoenix.Presence) I would just use that one.

3 Likes

As a side note, do not use terminate/2 as it is not always called…

Can you be more specific?

The documentation of terminate/2 has no warnings. In which cases does a socket disconnect and the callback not called?

Hmmm, perhaps, but at first sight seemed overkill to me. I do not want to track presences globally, nor I want to notify anyone about diffs. I only want to log when a socket is closed.

It’s not specific to websocket, it is for GenServer…

https://hexdocs.pm/elixir/GenServer.html#c:terminate/2

You can see why it is not always called.

BTW, a simple monitor will catch those disconnections too.

Could you eleborate the idea based on a monitor?

On websocket join… You can add a Process.monitor to another process, in charge of catching DOWN events.

I’ve found something similar for channels from @chrismccord (hey!) here. I might try going that route, it’s more idiomatic than the ad-hoc channel.

That is exactly what I meant :slight_smile:

1 Like

I see a practical blocker, don’t know how to get the PID of the process you want to monitor from the socket module.

In theory, if I understand it correctly, that should be socket.tranport_pid, but that is still nil within connect/3.

Sorry, I monitor channel, not socket… You cannot use self() in socket connect.

See this older topic.

I have read there is an instrumentation event :phoenix_socket_connect, but I am not familiar with Phoenix instrumentation, searching right now in case that may help.

Ah, I see how it goes via :telemetry in the implementation of Phoenix.Logger.

Nah, in the event I do not have the socket, or the assigns, or the socket ID. You get these arguments:

[:phoenix, :socket_connected]
%{duration: 398900}
%{
  connect_info: %{},
  endpoint: DriverTelemetryWeb.Endpoint,
  log: false,
  params: %{"jwt" => "foo", "vsn" => "2.0.0"},
  result: :ok,
  serializer: Phoenix.Socket.V2.JSONSerializer,
  transport: :websocket,
  user_socket: DriverTelemetryWeb.DriverSocket,
  vsn: "2.0.0"
}
:ok

Running out of the ideas with the existing APIs.

I’ll sleep on it.

You can do it by defining your own Phoenix.Socket and implementing init with a track call. I didn’t want to do all of that, and was okay with a warning appearing, so I just defined the function before the use Phoenix.Socket call. pushex/lib/push_ex_web/channels/push_socket.ex at master · pushex-project/pushex · GitHub

That isn’t best practice, but demonstrates how you can track a Socket instead of a Channel. You enter a world of “may not be supported” once you do this, but I think it’s pretty small here.

2 Likes

Awesome, so I have a preliminary solution for this. Let me share it.

First, I think the natural place to log the connection is connect/2:

defmodule MyAppWeb.DriverSocket do
  def connect(%{"jwt" => jwt}, socket) do
    case verify(jwt) do
      {:ok, driver_id} ->
        Logger.info("Driver #{driver_id} connected")
        {:ok, assign(socket, :driver_id, driver_id)}

      _ ->
        :error
    end
  end
end

We’ll log the disconnection in the monitor:

defmodule MyAppWeb.SocketMonitor do
  use GenServer
  require Logger

  def start_link(_) do
    GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
  end

  def monitor(socket_pid, driver_id) do
    GenServer.call(__MODULE__, {:monitor, {socket_pid, driver_id}})
  end

  @impl true
  def init(sockets) do
    {:ok, sockets}
  end

  @impl true
  def handle_call({:monitor, {socket_pid, driver_id}}, _from, sockets) do
    Process.monitor(socket_pid)
    {:reply, :ok, Map.put(sockets, socket_pid, driver_id)}
  end

  @impl true
  def handle_info({:DOWN, _ref, :process, socket_pid, _reason}, sockets) do
    {driver_id, sockets} = Map.pop(sockets, socket_pid)
    Logger.info("Driver #{driver_id} disconnected")
    {:noreply, sockets}
  end
end

You start the monitor when the application boots:

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      # ...
      MyAppWeb.SocketMonitor
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

And, finally, the custom definition of init/1 in the socket module (I’ve simplified the one above):

defmodule MyAppWeb.DriverSocket do
  def init(state) do
    res = {:ok, {_, socket}} = Phoenix.Socket.__init__(state)
    MyAppWeb.SocketMonitor.monitor(socket.transport_pid, socket.assigns.driver_id)
    res
  end

  # Must go below, so our init/1 matches first.
  use Phoenix.Socket
end

This is redefining an internal method, but the risk seems controlled to me. I guess the test suite of the application would be all broken if Phoenix changes anything related to that.

Of course, it would be really cool that Phoenix had public API for this use case.

It would be awesome to be able to get rid of the warning, “this clause cannot match because a previous clause…”. Let me /cc @OvermindDL1 since he knows all the tricks.

This alternative does not issue a warning and it is not coupled to the implementation of the original init/1, only to its return value:

defmodule MyAppWeb.DriverSocket do
  use Phoenix.Socket
  defoverridable init: 1

  def init(state) do
    res = {:ok, {_, socket}} = super(state)
    MyAppWeb.SocketMonitor.monitor(socket.transport_pid, socket.assigns.driver_id)
    res
  end
end
2 Likes

Oh nice! I didn’t really know you could do that. I like that it’s not dependent on the implementation.