Is it possible to use Rexbug to trace messages send/received by processes?

Background

We have an app the keeps running. This is a OTP application that has other OTP applications as dependencies. It uses GenServers and Supervisors as do many of its dependencies.

Problem

The app doesn’t crash, but it constantly prints the following error:

=ERROR REPORT==== 14-Jun-2019::14:31:35.089496 ===
Unexpected message: {#Ref<0.2607974300.3493068808.175359>,badarg}

Rexbug?

I was told about this library, but after reading the README page I could not find any examples of message tracing between processes. I know Rexbug is a wrapper for the erlang counterpart, but I am not familiar with it either.

Questions

  1. Can this library help me in my task?
  2. Is it suitable for our use case? If not, what are the alternatives?

:wave:

What’s your use case exactly?


Do you want to get the incoming messages for a process?

Then maybe :sys.trace/2. Also check out http://erlang.org/doc/design_principles/spec_proc.html. But it would be useless, since you already know the “faulty” message: {#Ref<0.2607974300.3493068808.175359>,badarg}.

Also note that unlike redbug it might cause performance problems if enabled on a busy process.

So I guess you want to either know the process that received the faulty message (that info should be printed somewhere in the rest of the error report):

iex(2)> {:ok, agent} = Agent.start_link(fn -> 0 end) # agents are genservers
{:ok, #PID<0.111.0>}
iex(3)> send(agent, {make_ref, :badarg})
{#Reference<0.2487050872.1550581763.207046>, :badarg}
08:01:48.305 [error] Agent.Server #PID<0.111.0> received unexpected message in handle_info/2: {#Reference<0.2487050872.1550581763.207046>, :badarg}

Or you want to know the process that sent it. Not sure about this one, but it should be possible to find out once you know the callback module for the process that received it, look for anything that calls make_ref and sends a message to an erlang lib that might reply with a :badarg.

I already tried :erlang.trace and :erlang.trace_pattern but none of them were able to catch the process responsible. There is a chance I am miss using these tools, so I was looking for something easier the use but with a similar level of power.

This is actually fine for us. We are ok with sinking a machine if it provides us with the data we need to fix all the others. We see it as a sacrificial lamb :smiley:
We are open to all avenues here.

There is no error report. The message I show in the original post is all we get to work with. The only clue OTP gives us.

Can you post your logger config here? And :extra_applictions from your mix.exs if possible.

Also please tell us which version of Elixir and OTP you are running @Fl4m3Ph03n1x, some of the nice Logger things are only available with more recent versions of both…

Just in case, here’s what I get in one of my apps running in a release built with erlang 21.0 and elixir 1.8 when I send one of the genservers an “unexpected” message:

[error] Worb.HiraganaSession #PID<0.2565.0> received unexpected message in handle_info/2: {#Reference<0.2424035366.4111728641.125681>, :badarg}
1 Like

I’m not really sure about redbug, but with dbg captureing a process that sends {ref, :badarg} messages would look something like:

# start the logger process
:dbg.tracer()
# set the match spec to only capture {ref, :badarg} messages
# will only trace messages received by receiver, can be :_ to trace all processes as receivers
:dbg.tpe(:send, [{[receiver, {:"$1", :badarg}],[{:is_reference,:"$1"}],[]}])
# start the trace on sender, can also be :processes to trace all processes as senders
:dbg.p(sender, [:r]); 

You’ll start receiving log messages in the shape of:

(<0.108.0>) <0.117.0> ! {#Ref<0.3800804343.3707502597.14282>,badarg}

Which means process <0.108.0> sent the above message to <0.117.0>.

3 Likes

Logger configs:

# Configures Elixir's Logger
config :logger, :console,
  format: "$date $time $metadata[$level] $message\n",
  metadata: [:request_id]

# Do not print debug messages in production
config :logger,
  level: :error,
  handle_otp_reports: false,
  handle_sasl_reports: false,
  truncate: :infinity

Elixir, OTP and erlang versions:

elixir 1.8.1-otp-21
erlang 21.2.6

So if I just place that in a remote iex shell connected to the application, it should work, right?
Will try it out for sure!

Try handling OTP and SASL reports via Logger, also enable more metadata, at least temporarily. If in doubt just use :all.

Then you’ll get nicer output and it’s in a single line rather then in the block like thing that you get now…

1 Like

You could match the process identifiers and MFA on send / receive ?