Slow Logger backend slowing down other backends even in async mode?

MarcN · November 27, 2023, 3:47pm

Recently we have been running into an issue where one slow Logger backend is slowing down another Logger backend. This happens even when in async mode when there are only a few messages to log, so it seems unrelated to sync_threshold which was my first thought.

I made a small Mix project showing the behaviour (I tried this in the current elixir:latest image, which has Erlang/OTP 26 and Elixir 1.15.7). The project defines two simplistic Logger backends: MyQuickBackend and MySlowBackend. Both just log to the console using IO.puts but the slow backend then sleeps for 2 seconds after handling a message (to simulate a slow logger, for example logging to a slow remote server). The application itself simply logs 5 warnings. Here are the files:

File tree:

+ config
|--config.exs
+ lib
|--my_quick_backend.ex
|--my_slow_backend.ex 
|--project.ex
mix.exs
mix.lock

In mix.exs:

defmodule Project.MixProject do
  use Mix.Project

  def project do
    [
      app: :project,
      version: "0.1.0",
      elixir: "~> 1.15",
      start_permanent: Mix.env() == :prod,
      deps: deps()
    ]
  end

  def application, do: [mod: {Project, []}, extra_applications: [:logger]]
  defp deps, do: [{:logger_backends, "~> 1.0.0"}]
end

In mix.lock:

%{
  "logger_backends": {:hex, :logger_backends, "1.0.0", "09c4fad6202e08cb0fbd37f328282f16539aca380f512523ce9472b28edc6bdf", [:mix], [], "hexpm", "1faceb3e7ec3ef66a8f5746c5afd020e63996df6fd4eb8cdb789e5665ae6c9ce"},
}

In config/config.exs:

import Config

config :logger, backends: [:console, MySlowBackend, MyQuickBackend]

In lib/my_slow_backend.ex:

defmodule MySlowBackend do
  def init(_) do
    {:ok, %{}}
  end

  def handle_event({_level, _gl, {Logger, msg, _ts, _meta}}, state) do
    IO.puts("MySlowBackend: #{msg} - now sleeping")
    Process.sleep(2000)

    {:ok, state}
  end

  def handle_event(:flush, state) do
    {:ok, state}
  end
end

In lib/my_quick_backend.ex:

defmodule MyQuickBackend do
  def init(_) do
    {:ok, %{}}
  end

  def handle_event({_level, _gl, {Logger, msg, _ts, _meta}}, state) do
    IO.puts("MyQuickBackend: #{msg}")

    {:ok, state}
  end

  def handle_event(:flush, state) do
    {:ok, state}
  end
end

In lib/project.ex:

defmodule Project do
  require Logger
  use Application

  def start(_start_type, _start_args) do
    1..5 |> Enum.each(fn n -> Logger.warning("Warning #{n}") end)

    Supervisor.start_link([], strategy: :one_for_one)
  end
end

Run with mix do deps.get, run.

What I see happening is to following: the :console backend logs these 5 warnings instantly as expected, and the MySlowBackend backend logs the warnings at two-second intervals as expected. However, the MyQuickBackend backend is also handling the warnings are two-second intervals, which I did not expect at all. Note that the 5 warnings is far below the default sync_threshold, and in any case, increasing this threshold does not make this problem go away.

Why does this happen? Is this expected behaviour? Am I doing something wrong?

hst337 · November 27, 2023, 3:51pm

Have you tried changing order of backends in the list?

MarcN · November 27, 2023, 4:33pm

Good suggestion. I tried a few permutations and the result is always the same: the :console backend always instant, and both other backends are slow.

ausimian · November 27, 2023, 7:16pm

I think Elixir’s backends run as :gen_event handlers in a single process. Each handler’s callback is processed serially within the Logger’s event manager. This is also why, given your test of 5 successive messages, the order doesn’t matter.

I suspect if you rewrote the slow handler to forward messages to another process, the fast handler would no longer be affected in the same way.

MarcN · November 27, 2023, 8:13pm

Yeah I guess it is something like that, but then I don’t understand why the :console logger does not suffer the same slowdown?

Also, the documentation (LoggerBackends — logger_backends v1.0.0) says the log handling is asynchronous by default so I’m surprised that all backend handlers are in the same process (although I suppose that is still async if viewed from the main thread).

Does this mean each Logger backend should implement a supervisor structure with a child process that does the actual event handling just to make sure you’re not blocking some other processes? If so, it might be good to add some text about that to the docs.

ausimian · November 27, 2023, 9:25pm

I’m guessing (and also typing this on my phone) but is it possible that :console is an erlang :logger handler in contrast to your backend handlers which are Elixir Logger handlers? They go through different paths I think, but prolly better if someone who actually knows chimes in.

josevalim · November 28, 2023, 1:01am

@ausimian is 100% correct in both replies.