Question about code from The Little Elixir & OTP Guidebook

voger · February 6, 2017, 11:10pm

I apologize in advance if this is not the proper place to ask questions about code from books but I don’t know where else to ask and I believe a lot of people here have read it and will be familiar with the code. If there is a more appropriate place please point me there.

Here is the question:

In chapter 6 the author creates a worker supervisor with this code

def init({m, f, a} = x) do
	worker_opts = [restart: :permanent,
								 function: f]
	children = [worker(m, a, worker_opts)]

	opts = [strategy: :simple_one_for_one,
					max_restarts: 5,
					max_seconds: 5
	]

	supervise(children, opts)
end

and in chapter 7 he creates a handle_info function in the server.ex and sets the server to :trap_exit, true

def handle_info({:EXIT, pid, _reason}, state = %{monitors: monitors, workers: workers, worker_sup: worker_sup}) do
    case :ets.lookup(monitors, pid) do
      [{pid, ref}] ->
        true = Process.demonitor(ref)
        true = :ets.delete(monitors, pid)
        new_state = %{state | workers: [new_worker(worker_sup)|workers]}
        {:noreply, new_state}

      [] ->
        {:noreply, state}
    end
    {:noreply, state}
  end

 defp new_worker(sup) do
   {:ok, worker} = Supervisor.start_child(sup, [[]])
   Process.link(worker)
   worker
 end

I don’t understand why we need to create a new worker process in this line in handle_info

new_state = %{state | workers: [new_worker(worker_sup)|workers]}

Since the worker has a restart: :permanent option and the supervisor has a strategy: simple_one_for_one the crashed worker will restart anyway. Why do we need a call to Supervisor.start_child(sup, [[]])

Let’s say we start with 5 workers. We crash one. Because of strategy: simple_one_for_one we get a new worker. Then we call Supervisor.start_child(sup, [[]]). Wouldn’t this give us a 6th worker? I know it doesn’t but why?

A theory I have is: because of the handle_info(:EXIT, ... the server is first to handle the crash, before the supervisor has a chance to do anything. When it comes the supervisor’s turn, it sees that it still monitors 5 workers so it’s all good.

UPDATE: I tried to set the restart options to restart: :temporary for the worker and this time no new process replaced the crashed process even though Supervisor.start_child is called. So my theory doesn’t hold.

UPDATE 2: I tried also returning simply{:noreply, state } (without %{state | workers: [new_worker(worker_sup)|workers]}) and the process still gets re spawned.

wfgilman · February 6, 2017, 11:35pm

I thought the :EXIT signal was for a process that exited safely, not a crash. In that case the supervisor wouldn’t restart the child. I thought :DOWN was for unexpected terminations and the supervisor would restart those processes.

voger · February 7, 2017, 1:00pm

:EXIT and :DOWN do not differentiate forced or normal exits but links and monitors. In that code :DOWN is used to monitor the client process. The one who requested one of our worker processes from the pool.