I am trying to handle Oban telemetry events in my specific case, but my questions applies to telemetry handlers in general. During development, the telemetry listener would crash and detach logging the following error:
Handler “oban-logger” has failed and has been detached
The handler does not re-attach itself automatically and the server needs to be restarted in order to continue development. This would be disastrous if happened in production.
Is there a way to achieve similar robustness as with supervisors (auto restart on failure)? What is the correct way of handling errors in telemetry handlers? My current understanding is you have to make sure that your function 100% can’t crash, but this kind of goes against the ‘let it crash’ mantra of Erlang and Elixir.
Below is a part of my source code for context.
telemetry.ex
@impl Supervisor
def init(_arg) do
children = [
{:telemetry_poller, measurements: periodic_measurements(), period: 10_000},
{FacadeScan.ObanLogger,
events: [[:oban, :job, :start], [:oban, :job, :stop], [:oban, :job, :exception]]}
]
Supervisor.init(children, strategy: :one_for_one)
end
oban_logger.ex
@impl GenServer
def init(events) do
Process.flag(:trap_exit, true)
# https://hexdocs.pm/oban/Oban.html#module-instrumentation-error-reporting-and-logging
:telemetry.attach_many("oban-logger", events, &FacadeScan.ObanLogger.handle_event/4, [])
{:ok, events}
end
@impl GenServer
def terminate(_, events) do
for event <- events do
:telemetry.detach({__MODULE__, event, self()})
end
:ok
end
def handle_event(
[:oban, :job, :exception],
measure,
%{worker: "FacadeScan.ImageProcessor"} = meta,
_
) do
# This might crash
end