How to detect wich process is getting unexpected messages?

Warning: what I’m about to show is not production safe! Try it out on local development or on some staging server.

You could use Erlang tracing to find out the processes sending and receiving these messages. Here’s a simple sketch using low-level Erlang tracing functions (code is copy-pastable to iex).

Suppose we have two following modules which are powering processes simulating your situation:

defmodule Receiver do
  use GenServer

  def start_link(_arg), do: GenServer.start_link(__MODULE__, nil, name: __MODULE__)
  def init(_arg), do: {:ok, nil}
end

defmodule Sender do
  use Task

  def start_link(_arg), do: Task.start_link(&loop/0)

  defp loop() do
    Process.sleep(2000)
    send(Process.whereis(Receiver), {make_ref(), :bad_arg})
    loop()
  end
end

You could add another process which traces {_, :badarg} messages sent from any process. Here’s a very simple version based on Erlang tracing BIFs:

defmodule Tracer do
  use GenServer

  def start_link(_arg), do: GenServer.start_link(__MODULE__, nil)

  def init(_) do
    :erlang.trace_pattern(:send, [{[:_, {:_, :bad_arg}], [], []}], [])
    :erlang.trace(:processes, true, [:send])
    {:ok, nil}
  end

  def handle_info({:trace, sender, :send, {_ref, :bad_arg}, receiver}, state) do
    IO.puts([
      "sender info:\n#{inspect(Process.info(sender), pretty: true)}\n\n",
      "receiver info:\n#{inspect(Process.info(receiver), pretty: true)}"
    ])

    {:noreply, state}
  end
end

Now we can start all these processes:

Supervisor.start_link(
  [Tracer, Sender, Receiver],
  strategy: :one_for_one
)

The output (snipped a bunch of noise):

19:19:27.949 [error] Receiver Receiver received unexpected message in handle_info/2: {#Reference<0.3363507600.1371275267.244393>, :bad_arg}

sender info:
[
  ...
  dictionary: [
    ...
    "$initial_call": {Sender, :"-start_link/1-fun-0-", 0},
    ...
  ],
  ...
]

receiver info:
[
  registered_name: Receiver,
  ...
  dictionary: [
    "$initial_call": {Receiver, :init, 1},
    ...
  ],
  ...
]

Depending on your particular situation, you might (but not necessarily will) be able to pinpoint the modules powering these processes. AFAIK, it’s not possible to get the stack trace, so you’ll need to analyze the code. But hopefully, the info above might help you track down the problematic process/module.

For production-safe tracing, check out recon_trace, and redbug (there’s also Elixir wrapper rexbug). You might also want to read this recent post on tracing in Elixir.

6 Likes