Logger messages not caught by ExUnit

baguetteNunchuks · July 22, 2024, 2:01pm

Hey all,
I am testing a GenServer that produces a log depending on the result of a message production func via :brod.

Here is the start of the invocation in the GenServer:

  @spec produce_refresh_message(pid :: pid(), RefreshData.t()) :: :ok
  def produce_refresh_message(pid, refresh_data) do
    GenServer.cast(pid, {:produce_refresh_message, refresh_data})
  end

  @impl true
  @spec handle_cast({:produce_refresh_message, RefreshData.t()}, any()) ::
          {:noreply, any()}
  def handle_cast({:produce_refresh_message, refresh_data}, state) do
    refresh_data
    |> format_message()
    |> do_produce_refresh_message()

    {:noreply, state}
  end

Here’s what the func in question looks like:

  def do_produce_refresh_message(message) do
    case kafka_module().produce_sync(:refresh_message_producer, message) do
      :ok ->
        Metrics.kafka_refresh_message_produced(:success)
        :ok
      {:error, reason} ->
        IO.puts "We're in the error block of do_produce_refresh_message"
        Logger.error(%{message: "Failed to produce refresh message", reason: inspect(reason)})
        Metrics.kafka_refresh_message_produced(:error)
        :ok
    end
  end

I am running a test via ExUnit that verifies that the log message is produced in the error case, but… the log is not being emitted, or at the very least, not being captured by the test process.

Here is the test:

    test "it logs a message when the message production fails",
      %{produce_failure_refresh_data: produce_failure_refresh_data, producer_pid: producer_pid} do
      log_output =
        capture_log(fn ->
          # this only works when the function is called directly from the module.
          # if we call the produce_refresh_message function, we get to the Logger line in the module
          # but the Logger.error call does not get captured by the capture_log function for some reason...
          RefreshMessageProducer.produce_refresh_message(producer_pid, produce_failure_refresh_data)

          # for some reason, the direct handle_cast does produce logs correctly...
          # RefreshMessageProducer.handle_cast({:produce_refresh_message, produce_failure_refresh_data}, nil)

        end)

      assert log_output =~ ~s(\"level\":\"error\")
      assert log_output =~ "you blew it, brod."
    end

As you can see in the comments, when the code path is reached via direct invocation of the handle_cast callback, the Log is captured and the test succeeds… but when we call top level function, it does not.

This confuses me greatly. Clearly we’re hitting the Logger codepath in either case, but why would the direct cast ensure that Logger is picked up by ExUnit?

Some context and things I have tried:
The Kafka module is mocked, and we can verify that the error case in the func is being handled because we can see the associated console logs from IO.puts. So at the very least, we know the correct codepath is being executed.

The Metric emission that occurs below the error log also fires, so we are getting past the Logger call.

To invoke the process, we are using the pid of the GenServer which we grab via start_supervised! in the test setup:
producer_pid = start_supervised!(OurModuleUnderTest)

This thread refers to the Logger level filtering that might impact visibility; this can’t be the issue if we can get the same Logger.error through via a direct handle_cast though? Settings in config and test.exs also check out.

So, yeah. Seems like the Logger.error is being sent somewhere, but not to the test process in the one case. I’ve been banging my head on this for a bit, so any thoughts are welcome!

garrison · July 22, 2024, 3:16pm

I’m afraid you’re over-thinking this a bit GenServer.cast is asynchronous. When you call it, it sends a message to the GenServer process and then returns immediately - it does not block waiting for the operation to complete.

Your capture_log block (and probably the entire test) is terminating before the GenServer finishes executing the function, so it doesn’t see the log message.

There are a couple things you can do to fix this. One would be to use GenServer.call instead, which is synchronous and sends back a return value after execution. In the same vein you could also explicitly send a return message back to the test process, but sticking with the GenServer paradigm would be best if you take a synchronous approach (there is also a GenServer.reply you can take a look at).

However, if you actually do want the function to run asynchronously (which is a perfectly valid use case), you need to do something to ensure that the GenServer finishes before the capture_log returns. The simplest solution would just be to sleep for a short time in your capture_log block after casting (a couple ms would probably be enough, but make sure it doesn’t flake).

https://hexdocs.pm/elixir/1.17.2/Process.html#sleep/1

al2o3cr · July 22, 2024, 3:26pm

Another option that’s more reliable than sleep is something like :sys.get_state - the target GenServer can’t handle the system message until handle_cast returns, so the test always waits exactly long enough.

garrison · July 22, 2024, 4:40pm

Ah, that’s a much better solution I hadn’t seen before Would only work on gen_* processes of course, but that’s perfect in this case.