Handle GRPC channel on client side

Hi all,
I’m creating a GRPC Client using grpc that deals only with unary RPCs.
I want to use the same gRPC channel as much as possible and not open new ones for each request.
So I decided to create a GenServer that is responsible for

  • creating the channel on app start
  • keeping the channel open and try to reopen if it was lost for some reason

My goal was to keep my app running when the gRPC Server is down and be able to re-create the connection once the Server is back again.

Genserver code

defmodule App.GRPCChannel do
  use GenServer
  require Logger

  alias App.Settings

  # client
  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok, name: __MODULE__)
  end

  def channel() do
    case GenServer.call(__MODULE__, :channel) do
      %GRPC.Channel{} = channel -> {:ok, channel}
      _ -> {:error, :no_connection}
    end
  end

  # server
  @impl true
  def init(_) do
    initialize_channel()
  end

  @impl true
  def handle_info({:gun_down, _, _, _, _}, _state) do
    Logger.debug("GRPC Server not connected.")

    {:noreply, :no_connection}
  end

  @impl true
  def handle_info({:gun_up, _, _, _, _}, state) do
    Logger.debug("GRPC Server connected")

    {:noreply, state}
  end

  @impl true
  def handle_info({:gun_up, pid, _}, :no_connection) do
    Process.exit(pid, :kill)

    Logger.debug("GRPC Client killed zombie Gun process, pid: #{inspect(pid)}")

    {:noreply, :no_connection}
  end

  @impl true
  def handle_info({:gun_up, pid, _}, state) do
    if pid != state.adapter_payload.conn_pid do
      Process.exit(pid, :kill)
      Logger.debug("GRPC Client killed zombie Gun process, pid: #{inspect(pid)}")
    end

    {:noreply, state}
  end

  @impl true
  def handle_call(:channel, _from, channel) do
    case channel do
      %GRPC.Channel{} = channel -> {:reply, channel, channel}
      :no_connection -> restart_channel()
    end
  end

  defp restart_channel() do
    case initialize_channel() do
      {:ok, :no_connection} -> {:noreply, :no_connection}
      {:ok, channel} -> {:reply, channel, channel}
    end
  end

  defp initialize_channel() do
    address = Settings.iam_service_url()
    opts = Settings.iam_service_connection_opts()

    Logger.debug("GRPC Client connecting to gateway at #{address}")

    case GRPC.Stub.connect(address, opts) do
      {:error, error} ->
        Logger.critical("GRPC Client could not connect to GRPC Server. Message: #{error}")
        {:ok, :no_connection}

      {:ok, channel} ->
        Logger.debug("GRPC Client connected to the gateway at #{address}, using channel: #{inspect(channel)}")
        {:ok, channel}
    end
  end
end

The main ideas are

  • to only try to re-create the connection once channel() is called and it is a :no_connection.
  • use the default gun adapter and its messages to get info about the inner http server state - and based on that return have the channel itself or :no_connection as Genserver state.

When a gRPC request is fired it starts with fetching the channel every time, like this:

  def authenticate(resource_id, access_token) do
    with {:ok, %GRPC.Channel{} = channel} <- get_grpc_channel(),
         {:ok, %AuthenticateRequest{} = request} <- build_authenticate_request(resource_id, access_token),
         {:ok, %AuthenticateResponse{} = response} <- Stub.authenticate(channel, request) do
      {:ok, %{success: response.success, user_id: response.user_id}}
    else
      {:error, :iam_service} -> {:error, :iam_service}
      {:error, %GRPC.RPCError{} = _error} -> {:error, :unauthenticated}
    end
  end

  defp get_grpc_channel() do
    try do
      GRPCChannel.channel()
    catch
      :exit, _ -> {:error, :iam_service}
    end
  end

I need to catch the Genserver’s error here since it may exit when it couldn’t create a channel.

By trial and error (switching on and off the GRPC server) I noticed that the GRPC server might bring down my entire app due to the Supervisor retry logic, so I try to avoid it.
I also noticed that after restarting the GRPC server, the old gun process might re-connect and come back while the new one is also alive, that’s why I’m killing it, to keep only 1 alive.

I’m wondering how bad ideas are these…

All feedbacks are welcomed.

You might want to create a backoff strategy for your supervisor (or you can add that logic in your genserver), here is a small example of how this can be achieved: erlang - Supervisors with backoff - Stack Overflow

Thanks! I was thinking about it, but than I was thinking I only need the channel when a request wants to use it. In that case channel() will either return an existing channel or try to create one. With a backoff I’ll do the retry earlier in an automated manner, but I’m not sure what would be the benefit of doing that.