Trapping exit reason in Supervisor

I need help with seem to be a trivial task. My application root is Supervisor and it starts couple of GenServer based workers. I am trying to find a way to catch worker termination reason in my Supervisor, but for some reason I can’t find a way to do this. After reading through Hexdocs I figured out that with GenServer I can define terminate callback in GenServer, but it is not always guaranteed to be called. As supervisor is responsible for restarting terminated children I based on exit reason (:normal, :shutdown,…) I assume there should be a way to catch it.
Thanks in advance for your help.

Maybe you can overwrite handle_info/2 in your supervisor module?

require Logger

def handle_info({:EXIT, pid, reason}, state) do
  Logger.info("A child process died: #{reason}")
  case restart_child(pid, reason, state) do # it's a private function ... that's a problem
    {:ok, state} ->
	  {:noreply, state}
    {:shutdown, state} ->
	  {:stop, :shutdown, state}
  end
end

def handle_info(msg, state) do
  Logger:error("Supervisor received unexpected message: #{inspect(msg)}")
  {:noreply, state}
end
2 Likes

I see, I didn’t see any reference to handle_info in Elixir docs, let me see if I can get it to work…

It should be documented in there somewhere. But handle_info is the callback used to handle any non GenServer (call/cast) messages, that the process receives.

Supervisors are critical to the application’s reliability, so it is recommended to always use OTP’s standard (proven, simple, robust) supervisors rather than roll your own.

If some part of your application needs to be notified when processes terminate, add a GenServer that uses Process.monitor/1 to track other processes. This is orthogonal to the supervision hierarchy and therefore any issues in your monitoring code will not impact supervision.

6 Likes

I gave handle_info a try as you commented in the code restart_child/3 is private, there is restart_child/2 version, but in order to use it I need to maintain child_id somewhere.
Couple of questions though:

  1. Is there any disadvantage of using start_link to do restart?
  2. If I add “supervisor in the middle”, i.e. make my root Supervisor) to start “monitoring” Supervisor that in turn would start GenServer process, if GenServer terminates, “monitoring” Supervisor would receive call in handle_info, perform necessary logging and terminate - causing for the root Supervisor to restart it and in turn start GenServer. Is this reasonable?

Thanks

Thanks voltone, I already have in place Process.spawn_monitor to monitor child processes, but I was not sure if this the best way of trapping crashes, I got impression that Supervisor is intended way of handling process exits. Is there some good guidelines on when to use Supervisor and when to use monitors?

Thanks ScrimpyCat, I was not aware that Supervisor is GenServer, thanks to this https://medium.com/@StevenLeiva1/elixir-supervisors-a-conceptual-understanding-ee0825f70cbe it is now clear.

It depends on what you’re trying to achieve. Trapping exits is a tool, but what is the goal?

If you’re trying to control the lifecycle of the process, by deciding whether to restart it or escalate the error to a higher level supervisor, then that’s indeed the role of a supervisor. In that case you should first check if you can configure the standard supervisor strategies and parameters to achieve the desired behavior.

If you want better visibility into crashes and their reasons, consider using OTP’s built-in SASL module: it can be enabled through your application’s Logger configuration. Or monitor the processes (under a standard supervisor) from another process.

There might be cases where you do indeed need to implement a custom supervisor, but my advice is to make sure you’ve exhausted all other options first.

Thanks voltone, I got your point, in my case it is more of error reporting when crash has occurred and ability to log state prior to crash, so I guess use of custom supervisor would be overkill.

It is my understanding that SASL Error Logging will capture the exit reason as part of the Supervisor Report. So it’s a matter of enabling SASL logging and capturing the log with an appropriate backend.

2 Likes

I am pretty late here, but I’m afraid one can not overwrite Supervisors handle_info/2, as well as handle_call and handle_cast, because while Supervisor does not define those callbacks.

defmodule Sup do
  use Supervisor

  def handle_info(any, state) do
    IO.inspect any
  end
end

# try messaging
send(sup_pid, :hello)

Trying the above does not work, it will say something like
01:52:12.737 [error] Supervisor received unexpected message: :hello

The handle_info handle_call the Supervisor can receive seems to be defined here, and Elixir is just calling that inside def which_children, do: ....

3 Likes