There are a few considerations here:
-
A supervisor crashing would generally be the result of multiple crashes of its children. It’s not generally advisable unless you have good reason to replay potentially crappy states. If you save too much state for these processes and that ends up being what actually killed the supervisor from the beginning, you’re only creating a cascade of crashes that will ultimately kill the entire supervision tree as the supervisors of the supervisors start crashing fast enough.
-
It can be an idea not put logic like this in your supervisors, but instead have a managing process (I prefer to have X.Supervisor
and X.Manager
for whatever X
is) that would deal with the logic surrounding spawning, killing, otherwise managing whatever the thing is. What that would entail in this case is the supervisor sending an asynchronous message to a manager when it starts and that manager then starting the children.
Generally I’m wary of potentially replaying poisoned state, so much so that I’ve always felt that it wasn’t worth it.
Here is a sketch of what you could do, though:
Sandbox.SomeChild
:
defmodule Sandbox.SomeChild do
use GenServer, restart: :transient
def start_link(name), do: GenServer.start_link(__MODULE__, [name])
def name(pid), do: GenServer.call(pid, :name)
def init([name]), do: {:ok, name}
def handle_call(:name, _from, name), do: {:reply, name, name}
end
Sandbox.SomeChild.Supervisor
:
defmodule Sandbox.SomeChild.Supervisor do
require Logger
use DynamicSupervisor
def start_link([]), do: DynamicSupervisor.start_link(__MODULE__, [], name: __MODULE__)
def start_child(supervisor_pid, name) do
spec = {Sandbox.SomeChild, name}
DynamicSupervisor.start_child(supervisor_pid, spec)
end
def init([]) do
Sandbox.SomeChild.Manager.start_children(self())
DynamicSupervisor.init(strategy: :one_for_one)
end
end
Sandbox.SomeChild.Manager
:
defmodule Sandbox.SomeChild.Manager do
use GenServer
def start_link([]), do: GenServer.start_link(__MODULE__, [], name: __MODULE__)
def names(pid \\ __MODULE__), do: GenServer.call(pid, :names)
def add_name(name, pid \\ __MODULE__), do: GenServer.cast(pid, {:add_name, name})
def start_children(supervisor_pid, pid \\ __MODULE__) do
GenServer.cast(pid, {:start_children, supervisor_pid})
end
def init([]), do: {:ok, []}
def handle_call(:names, _from, names) do
{:reply, names, names}
end
def handle_cast({:add_name, name}, names) do
{:noreply, [name | names]}
end
def handle_cast({:start_children, supervisor_pid}, names) do
Enum.each(names, &Sandbox.SomeChild.Supervisor.start_child(supervisor_pid, &1))
{:noreply, names}
end
end
iex(1)> Sandbox.SomeChild.Manager.start_link([])
{:ok, #PID<0.163.0>}
iex(2)> Sandbox.SomeChild.Manager.add_name("hej")
:ok
iex(3)> Sandbox.SomeChild.Supervisor.start_link([])
{:ok, #PID<0.166.0>}
iex(4)> [{_, pid, _, _}] = DynamicSupervisor.which_children(Sandbox.SomeChild.Supervisor)
[{:undefined, #PID<0.168.0>, :worker, [Sandbox.SomeChild]}]
iex(5)> Sandbox.SomeChild.name(pid)
"hej"
iex(6)> Sandbox.SomeChild.Manager.names()
["hej"]
iex(1)> Sandbox.SomeChild.Manager.start_link([])
{:ok, #PID<0.154.0>}
iex(2)> Sandbox.SomeChild.Manager.add_name("hej")
:ok
iex(3)> Sandbox.SomeChild.Manager.add_name("hopp")
:ok
iex(4)> Sandbox.SomeChild.Supervisor.start_link([])
{:ok, #PID<0.158.0>}
iex(5)> [{_, child1, _, _}, {_, child2, _, _}] = DynamicSupervisor.which_children(Sandbox.SomeChild.Supervisor)
[
{:undefined, #PID<0.160.0>, :worker, [Sandbox.SomeChild]},
{:undefined, #PID<0.161.0>, :worker, [Sandbox.SomeChild]}
]
iex(6)> Sandbox.SomeChild.Manager.names()
["hopp", "hej"]
iex(7)> [child1, child2] |> Enum.map(&Sandbox.SomeChild.name/1)
["hopp", "hej"]
Note that if there is anything fatally wrong about the state that’s being stored in the manager you’ll have essentially only set up a guarantee that everything is going to crash almost instantly, as long as that poisoned state is used early enough in the started childrens’ lifetime. Worst case scenario the poisoned state doesn’t kill the children fast enough and you just have bombs lying there in wait, but they don’t kill anything fast enough to have the system die, so you can’t rely on a bad system being shut down for safety.