I have a problem when I want to supervise user-defined jobs.
Jobs have steps and between steps, I am saving the last known finished state.
In the case of a node restart, I am reading that info from DB and resume.
I started by using DynamicSupervisor with Task for each user-specified job.
The problem is that when the job is ill-defined. It can fail immediately.
I want to restart it three times and after that give up.
However, I don’t want to kill the entire DynamicSupervisor with it because other tasks are perfectly fine.
I decided, I’ll change the hierarchy.
DynamicSupervisorOfSupervisors
`- JobSupervisor (one_for_one, restart: :temporary) (don't restart it if it crashes; crash means that Worker crashed quickly in succession so we want to give up)
`- ActualWorker (restart: :transient) (restart if it fails)
That solves one problem because now I don’t crash DynamicSupervisor and restarts work fine. But when the job finishes normally, I am left with dangling JobSupervisor
that has nothing to supervise but didn’t crash.
Is there an elegant solution for spawning a supervisor that finishes with its last finished child?
Maybe check out alternative supervisor implementations like @sasajuric Parent, supervisor2
or director
. All of them should provide you with finer grade control over the supervisor process.
1 Like
The upcoming 0.11 version of Parent could indeed be used for this.
The untested sketch would look something like:
defmodule Job do
use Parent.GenServer, restart: :temporary
def start_link(arg), do: Parent.GenServer.start_link(arg)
@impl GenServer
def init(arg) do
{:ok, _pid} =
Parent.start_child(%{
id: :job,
restart: :transient,
ephemeral?: true,
start: mfa_or_zero_arity_fun
})
{:ok, initial_state}
end
@impl Parent.GenServer
def handle_stopped_children(%{job: _}, state), do: {:stop, :normal, state}
end
See docs for more details, and let me know if you have some questions.
If Parent
wasn’t available, I’d develop the same thing manually. Basically I’d turn JobSupervisor
into a GenServer
named Job
, trap exits, and start a Task
process as a child, handling :EXIT
messages and manually calculating number of restarts.
I’ve created parent after being fed up with having to do this manually again and again 
1 Like
Thanks, I’ll check it out!
For now, I am passing PID of JobSupervisor
to the job and call Supervisor.stop(supervisor_pid)
at the end.
Parent seems more elegant though, so I might refactor it later.
1 Like