I apologize in advance if this is not the proper place to ask questions about code from books but I don’t know where else to ask and I believe a lot of people here have read it and will be familiar with the code. If there is a more appropriate place please point me there.
Here is the question:
In chapter 6 the author creates a worker supervisor with this code
def init({m, f, a} = x) do
worker_opts = [restart: :permanent,
function: f]
children = [worker(m, a, worker_opts)]
opts = [strategy: :simple_one_for_one,
max_restarts: 5,
max_seconds: 5
]
supervise(children, opts)
end
and in chapter 7 he creates a handle_info function in the server.ex and sets the server to :trap_exit, true
def handle_info({:EXIT, pid, _reason}, state = %{monitors: monitors, workers: workers, worker_sup: worker_sup}) do
case :ets.lookup(monitors, pid) do
[{pid, ref}] ->
true = Process.demonitor(ref)
true = :ets.delete(monitors, pid)
new_state = %{state | workers: [new_worker(worker_sup)|workers]}
{:noreply, new_state}
[] ->
{:noreply, state}
end
{:noreply, state}
end
defp new_worker(sup) do
{:ok, worker} = Supervisor.start_child(sup, [[]])
Process.link(worker)
worker
end
I don’t understand why we need to create a new worker process in this line in handle_info
new_state = %{state | workers: [new_worker(worker_sup)|workers]}
Since the worker has a restart: :permanent
option and the supervisor has a strategy: simple_one_for_one
the crashed worker will restart anyway. Why do we need a call to Supervisor.start_child(sup, [[]])
Let’s say we start with 5 workers. We crash one. Because of strategy: simple_one_for_one
we get a new worker. Then we call Supervisor.start_child(sup, [[]])
. Wouldn’t this give us a 6th worker? I know it doesn’t but why?
A theory I have is: because of the handle_info(:EXIT, ...
the server is first to handle the crash, before the supervisor has a chance to do anything. When it comes the supervisor’s turn, it sees that it still monitors 5 workers so it’s all good.
UPDATE: I tried to set the restart options to restart: :temporary
for the worker and this time no new process replaced the crashed process even though Supervisor.start_child is called. So my theory doesn’t hold.
UPDATE 2: I tried also returning simply{:noreply, state }
(without %{state | workers: [new_worker(worker_sup)|workers]}
) and the process still gets re spawned.