DynamicSupervisor not restarting child: why?


I’m writing my own toy implementation of a worker pool manager for learning purposes.

I do have one remaining issue (or at least only one I’ve identified :stuck_out_tongue:): when creating a new pool, a dynamic supervisor is used to start another (normal) supervisor that sits above all processes related to that pool. If I kill this supervisor, it doesn’t get restarted. Instead I get an error like

08:39:40.848 [error] GenServer #PID<0.4648.0> terminating
** (stop) killed
Last message: {:EXIT, #PID<0.4647.0>, :killed}
State: %PoolToy.PoolMan.State{idle_overflow: [], monitors: :"monitors_#PID<0.4648.0>", overflow: 0, overflow_max: 0, overflow_ttl: 0, overflow_ttl_timer: nil, size: 2, spec: PoolToy.Worker, sup: #PID<0.4647.0>, waiting: {[], []}, worker_sup: #PID<0.4649.0>, workers: [#PID<0.4652.0>, #PID<0.4651.0>]}
08:39:40.849 [error] GenServer #PID<0.4649.0> terminating
** (stop) killed
Last message: {:EXIT, #PID<0.4647.0>, :killed}
State: %DynamicSupervisor{args: [], children: %{#PID<0.4651.0> => {{PoolToy.Worker, :start_link, :undefined}, :temporary, 5000, :worker, [PoolToy.Worker]}, #PID<0.4652.0> => {{PoolToy.Worker, :start_link, :undefined}, :temporary, 5000, :worker, [PoolToy.Worker]}}, dynamic: 2, extra_arguments: [], max_children: :infinity, max_restarts: 3, max_seconds: 5, mod: PoolToy.WorkerSup, name: {#PID<0.4649.0>, PoolToy.WorkerSup}, restarts: [], strategy: :one_for_one}

I don’t know what code would be relevant to include here, so I’ve pushed the current state to github: https://github.com/davidsulc/pool_toy You can download the code from https://github.com/davidsulc/pool_toy/archive/master.zip

Here are the steps to reproduce the issue (within the pool_toy directory):

# start the app
iex -S mix

# start a pool named :pool_a with 2 workers
iex> PoolToy.start_pool(PoolToy.Worker, 2, name: :pool_a)

# start the observer
iex> :observer.start

Within the applications tab of the observer, right click on the direct child of the named Elixir.PoolToy.PoolsSup process (which will have no name, just a pid) and kill it. I would expect the process to get restarted by the Elixir.PoolToy.PoolsSup dynamic supervisor but that’s not the case. Why?

Your restart strategy is :temporary which means it won’t be restarted even if it fails. Take a look at this: https://hexdocs.pm/elixir/Supervisor.Spec.html#module-restart-values-restart to know which restart strategy you should use.

1 Like

The dynamic supervisor should be as dumb as possible.

If You are still following poolboy example, there is a GenServer that use a single_one_for_one supervisor for starting temporary worker. This GenServer is linked to those temporary workers, and thus knows when they are dying. It can take appropriate measure…

With the Dynamic supervisor scenario, it would be a GenServer, starting workers processes, through a dynamic supervisor… and linking each worker process.

Workers are temporary in poolboy, because they are not restarted from the supervisor, even if in your case workers are also supervisors

Having cleared my head, @minhajuddin put me on the right track: I got confused between the restart strategy given to the use macro and the restart strategy given to the init function of the Supervisor/DynamicSupervisor modules…

Of course, the value given to use does NOT determine how/if children are restarted, but determines the module’s own restart strategy as it’s used to create the module’s child spec. The restart value given to init on the other hand determines how/if the supervisor’s children are restarted. Confusing the two was the cause of my downfall :stuck_out_tongue:

Thanks @minhajuddin and @kokolegorille for your help!