Spawning temporary supervisors

I have a problem when I want to supervise user-defined jobs.
Jobs have steps and between steps, I am saving the last known finished state.
In the case of a node restart, I am reading that info from DB and resume.

I started by using DynamicSupervisor with Task for each user-specified job.
The problem is that when the job is ill-defined. It can fail immediately.
I want to restart it three times and after that give up.

However, I don’t want to kill the entire DynamicSupervisor with it because other tasks are perfectly fine.

I decided, I’ll change the hierarchy.

DynamicSupervisorOfSupervisors
`- JobSupervisor (one_for_one, restart: :temporary) (don't restart it if it crashes; crash means that Worker crashed quickly in succession so we want to give up)
   `- ActualWorker (restart: :transient) (restart if it fails)

That solves one problem because now I don’t crash DynamicSupervisor and restarts work fine. But when the job finishes normally, I am left with dangling JobSupervisor that has nothing to supervise but didn’t crash.

Is there an elegant solution for spawning a supervisor that finishes with its last finished child?

Maybe check out alternative supervisor implementations like @sasajuric Parent, supervisor2 or director. All of them should provide you with finer grade control over the supervisor process.

1 Like

The upcoming 0.11 version of Parent could indeed be used for this.

The untested sketch would look something like:

defmodule Job do
  use Parent.GenServer, restart: :temporary

  def start_link(arg), do: Parent.GenServer.start_link(arg)

  @impl GenServer
  def init(arg) do
    {:ok, _pid} = 
      Parent.start_child(%{
        id: :job, 
        restart: :transient, 
        ephemeral?: true, 
        start: mfa_or_zero_arity_fun
      })

    {:ok, initial_state}
  end

  @impl Parent.GenServer
  def handle_stopped_children(%{job: _}, state), do: {:stop, :normal, state}
end

See docs for more details, and let me know if you have some questions.

If Parent wasn’t available, I’d develop the same thing manually. Basically I’d turn JobSupervisor into a GenServer named Job, trap exits, and start a Task process as a child, handling :EXIT messages and manually calculating number of restarts.

I’ve created parent after being fed up with having to do this manually again and again :slight_smile:

1 Like

Thanks, I’ll check it out!

For now, I am passing PID of JobSupervisor to the job and call Supervisor.stop(supervisor_pid) at the end.
Parent seems more elegant though, so I might refactor it later.

1 Like