You might want to delegate worker restart to another GenServer. This would avoid the system to go down.
defmodule GameEngine.Games.Worker do
@moduledoc false
use GenServer, restart: :temporary
...
end
And in a linked Process traping exit
# Workers trap EXIT
@impl GenServer
def handle_info(
{:EXIT, pid, reason},
%{worker_sup: worker_sup} = state
) do
log("#{@name} catched EXIT #{inspect(reason)}")
case :ets.lookup(__MODULE__, pid) do
[{pid, name, sender, receiver}] ->
true = :ets.delete(__MODULE__, pid)
# Do not restart worker if normal or timeout!
if reason in [:normal, {:shutdown, :timeout}] do
notify(%{type: :game_stopped, payload: name})
else
log("Restarting : #{name}")
with {:ok, worker} <-
start_worker(worker_sup, %{uuid: name, sender: sender, receiver: receiver}) do
true = :ets.insert(__MODULE__, {worker, name, sender, receiver})
{:reply, {:ok, worker}, state}
else
{:error, reason} ->
log("Could not restart : #{name} #{inspect(reason)}")
end
end
[] ->
true
end
{:noreply, state}
end
defp start_worker(sup, %{uuid: name} = args) do
spec = %{
id: name,
start: {Worker, :start_link, [args]},
restart: :temporary,
type: :worker
}
case DynamicSupervisor.start_child(sup, spec) do
{:ok, worker} ->
Process.link(worker)
notify(%{type: :game_created, payload: Worker.get_state(worker)})
{:ok, worker}
{:error, reason} ->
{:error, reason}
end
end
In this case, You can adapt the response to the stop reason. This way, it is not the supervisor who is in charge, but a GenServer. You can have an error, the server might die, but it won’t take your system down.
Or for your case, You could return :error, and You can catch it when it happens.