Logger.Backend.Console hangs and blocks calling proccess

ananthakumaran · May 31, 2021, 12:23pm

This seems like the same issue we faced [thread] a long time ago. It used to happen like once a month. The pod would go out of memory in the older elixir version, in the newer one 1.9+, it would just hang. I spent a lot of time trying to understand where the issue was (is it docker or erlang level or elixir logger), I was never able to reproduce the issue consistently.

There is a jira ticket in erlang issue tracker which closely resembled our case. There was an interesting note about calling port_info function on the port would unblock the port. I added the following prometheus metrics collector to our services and the issue stopped occurring after that. I still don’t know what the real problem was, and how calling port_info periodically is fixing the issue

defmodule Core.Prometheus.StandardIOCollector do
  use Prometheus.Collector
  alias Prometheus.Model

  def collect_mf(_registry, callback) do
    stderr = find_by_name('2/2')
    stdout = find_by_name('0/1')

    if stderr do
      callback.(
        Prometheus.Model.create_mf(
          :erlang_stderr_queue_bytes,
          "STDERR port queue size",
          :gauge,
          __MODULE__,
          stderr
        )
      )
    end

    if stdout do
      callback.(
        Prometheus.Model.create_mf(
          :erlang_stdout_queue_bytes,
          "STDOUT port queue size",
          :gauge,
          __MODULE__,
          stdout
        )
      )
    end

    :ok
  end

  def collect_metrics(metric, port)
      when metric in [:erlang_stdout_queue_bytes, :erlang_stderr_queue_bytes] do
    {:queue_size, bytes} = Port.info(port, :queue_size)
    Model.gauge_metrics([{[], bytes}])
  end

  defp find_by_name(name) do
    Port.list()
    |> Enum.find(fn port -> match?({:name, ^name}, Port.info(port, :name)) end)
  end
end