How to debug telemetry handlers that have errored

Is there a way to get Telemetry to output the full exception when a handler crashes?

It’s a little difficult to debug when you only get a message like this:

[error] Handler "my-app-handler-id" has failed and has been detached. Class=:error
Reason=:undef
Stacktrace=[
  {Logger, :info, ["GET /posts served in 2ms"], []},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
  ...

In particular the Reason=:undef feels cryptic compared to a normally formatted exception, but also it would be easier to read the stacktrace if it was formatted as if from Exception.format_stacktrace/1.

Also is there an easy way to re-attach the handler so I can test the updated code without restarting the server?

That’s erlang errors in a nutshell – it’s the error for module or function undefined. We usually don’t see them because of the elixir logger translator if this bubbles up, but telemetry doesn’t let this bubble to not break the place the handler was called in. And with it being an erlang library it also cannot use any of the elixir level translation after having cought the error.

What might be possible is having an telemetry_translator elixir library, which can translate the telemetry logs specifically by extracting the relevant data.

1 Like

How would that look? What are the example inputs it has to handle? E.g. just the :undef atom?

but telemetry doesn’t let this bubble to not break the place the handler was called in

Hmm, then maybe what I’m asking/looking for is for :telemetry to support a configuration (intended for development) that does let the error bubble so that we can get a full error message.

I don’t know if that is idiomatic, however the way I used to debug this back in the day was to wrap the handler code in a try/catch block and just print the error in the console.

2 Likes

The elixir translator is implemented as a logger filter, which transforms the log messages it receives. You can find information around how those filters work over in the otp docs for logger.

Ah, that’s a great tip! I forget how I introduced the original error in the OP but by adding this to the end my telemetry handler:

  rescue
    e ->
      Logger.error(Exception.format(:error, e, __STACKTRACE__))
  end

I can turn this error:

[error] Handler "my-app-handler-id" has failed and has been detached. Class=:error
Reason=%Protocol.UndefinedError{protocol: Enumerable, value: nil, description: ""}
Stacktrace=[
  {Enumerable, :impl_for!, 1, [file: ~c"lib/enum.ex", line: 1]},
  {Enumerable, :reduce, 3, [file: ~c"lib/enum.ex", line: 166]},
  {Enum, :map, 2, [file: ~c"lib/enum.ex", line: 4423]},
  {MyAppWeb.Telemetry, :handle_event, 4,
   [file: ~c"lib/my_app_web/telemetry.ex", line: 127]},
  {:telemetry, :"-execute/3-fun-0-", 4,
   [
     file: ~c"/Users/jason/dev/my_app/deps/telemetry/src/telemetry.erl",
     line: 167
   ]},
  {:lists, :foreach_1, 2, [file: ~c"lists.erl", line: 1686]},
  {Plug.Telemetry, :"-call/2-fun-0-", 4,
   [file: ~c"lib/plug/telemetry.ex", line: 76]},
  {Enum, :"-reduce/3-lists^foldl/2-0-", 3, [file: ~c"lib/enum.ex", line: 2531]},
  {Plug.Conn, :run_before_send, 2, [file: ~c"lib/plug/conn.ex", line: 1850]},
  {Plug.Conn, :send_resp, 1, [file: ~c"lib/plug/conn.ex", line: 441]},
  {MyAppWeb.PostController, :action, 2,
   [file: ~c"lib/my_app_web/controllers/post_controller.ex", line: 1]},
  {MyAppWeb.PostController, :phoenix_controller_pipeline, 2,
   [file: ~c"lib/my_app_web/controllers/post_controller.ex", line: 1]},
  {Phoenix.Router, :__call__, 5, [file: ~c"lib/phoenix/router.ex", line: 484]},
  {MyAppWeb.Endpoint, :plug_builder_call, 2,
   [file: ~c"lib/my_app_web/endpoint.ex", line: 1]},
  {MyAppWeb.Endpoint, :"call (overridable 3)", 2,
   [file: ~c"/Users/jason/dev/my_app/deps/plug/lib/plug/debugger.ex", line: 136]},
  {MyAppWeb.Endpoint, :call, 2, [file: ~c"lib/my_app_web/endpoint.ex", line: 1]},
  {Phoenix.Endpoint.SyncCodeReloadPlug, :do_call, 4,
   [file: ~c"lib/phoenix/endpoint/sync_code_reload_plug.ex", line: 22]},
  {Bandit.Pipeline, :call_plug!, 2,
   [file: ~c"lib/bandit/pipeline.ex", line: 124]},
  {Bandit.Pipeline, :run, 4, [file: ~c"lib/bandit/pipeline.ex", line: 36]},
  {Bandit.HTTP1.Handler, :handle_data, 3,
   [file: ~c"lib/bandit/http1/handler.ex", line: 12]}
]

Into this much easier to read error:

[error] ** (Protocol.UndefinedError) protocol Enumerable not implemented for nil of type Atom. This protocol is implemented for the following type(s): DBConnection.PrepareStream, DBConnection.Stream, Date.Range, Ecto.Adapters.SQL.Stream, File.Stream, Function, GenEvent.Stream, HashDict, HashSet, IO.Stream, Jason.OrderedObject, List, Map, MapSet, Phoenix.LiveView.LiveStream, Postgrex.Stream, Range, Stream
    (elixir 1.17.2) lib/enum.ex:1: Enumerable.impl_for!/1
    (elixir 1.17.2) lib/enum.ex:166: Enumerable.reduce/3
    (elixir 1.17.2) lib/enum.ex:4423: Enum.map/2
    (my_app 0.1.0) lib/my_app_web/telemetry.ex:127: MyAppWeb.Telemetry.handle_event/4
    (telemetry 1.3.0) /Users/jason/dev/my_app/deps/telemetry/src/telemetry.erl:167: anonymous fn/4 in :telemetry.execute/3
    (stdlib 5.2.3.1) lists.erl:1686: :lists.foreach_1/2
    (plug 1.16.1) lib/plug/telemetry.ex:76: anonymous fn/4 in Plug.Telemetry.call/2
    (elixir 1.17.2) lib/enum.ex:2531: Enum."-reduce/3-lists^foldl/2-0-"/3
    (plug 1.16.1) lib/plug/conn.ex:1850: Plug.Conn.run_before_send/2
    (plug 1.16.1) lib/plug/conn.ex:441: Plug.Conn.send_resp/1
    (my_app 0.1.0) lib/my_app_web/controllers/post_controller.ex:1: MyAppWeb.PostController.action/2
    (my_app 0.1.0) lib/my_app_web/controllers/post_controller.ex:1: MyAppWeb.PostController.phoenix_controller_pipeline/2
    (phoenix 1.7.14) lib/phoenix/router.ex:484: Phoenix.Router.__call__/5
    (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2
    (my_app 0.1.0) /Users/jason/dev/my_app/deps/plug/lib/plug/debugger.ex:136: MyAppWeb.Endpoint."call (overridable 3)"/2
    (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2
    (phoenix 1.7.14) lib/phoenix/endpoint/sync_code_reload_plug.ex:22: Phoenix.Endpoint.SyncCodeReloadPlug.do_call/4
    (bandit 1.5.7) lib/bandit/pipeline.ex:124: Bandit.Pipeline.call_plug!/2
    (bandit 1.5.7) lib/bandit/pipeline.ex:36: Bandit.Pipeline.run/4
    (bandit 1.5.7) lib/bandit/http1/handler.ex:12: Bandit.HTTP1.Handler.handle_data/3

And I think with the original Reason=:undef error the difference would be even more drastic!

I’ve marked your post as the solution since it’s a very practical way to approach this, but I’d still love an easier way to debug this without having to modify the handler itself.

Now I just need to remember to not accidentally commit that change :laughing:

1 Like

What is a “full error message” though. What you’re seeing is afaik all you get for erlang errors. It’s also all the elixir logger translator would see and turn into a more friendly sounding exception, but still without more details.

Elixir however also has the Exception.blame callback, which for exceptions can be used to include more context, which might be too expensive to produce in production environments. That’s again a pure elixir concept though.

That’s a great tip, especially given I doubt telemetry would add an option for customizing this.

With reraise you even get the same behaviour against telemetry instead of hiding the failure.

1 Like

By “full error message” I mean:

“(Protocol.UndefinedError) protocol Enumerable not implemented for nil of type Atom. This protocol is implemented for the following type(s): DBConnection.PrepareStream, DBConnection.Stream, Date.Range, Ecto.Adapters.SQL.Stream, File.Stream, Function, GenEvent.Stream, HashDict, HashSet, IO.Stream, Jason.OrderedObject, List, Map, MapSet, Phoenix.LiveView.LiveStream, Postgrex.Stream, Range, Stream”

instead of:

“Reason=%Protocol.UndefinedError{protocol: Enumerable, value: nil, description: “”}”

The second isn’t really even an error message (in my view) I’d consider it just an “error” since there’s no “message” describing the error.

A better name would probably be an “elixir-formatted” error message.

Yeah adding reraise definitely makes sense if you’re adding this to production. But for me in development I’d skip the reraise since otherwise if there’s many requests happening then I might miss the initial disconnection error due to all the other log output. Of course the downside is that the telemetry handler may be printing lots of errors, but generally I want errors to be quite noisy in dev so that I can find and fix them.

1 Like

You can make it as fancy as you want it to be, for example you could set a config value to enable/disable this. Then all you have to do is have a wrapper function over all your handlers.

1 Like

If that’s true then maybe the best approach would be a dev-only Telemetry library that is drop-in compatible with :telemetry.