Task.Supervisor and SIGTERM

In iex…

iex> Task.Supervisor.async_nolink(
  MyTaskSupervisor,
  fn -> Process.sleep(60_000) end,
  shutdown: 20_000
)

Then in a shell, I do…

kill 27282 # The unix pid for iex

In iex, I see this:

[notice] SIGTERM received - shutting down

And then the iex process ends almost immediately. It does not wait for the shutdown: 20_000. What am I doing wrong?

Without Process.flag(:trap_exit, true) the process just stops when it get’s an exit signal. You can find this documented quite well here: Process — Elixir v1.13.4

{:ok, pid} = Task.Supervisor.start_link()

Task.Supervisor.async_nolink(
  pid,
  fn -> 
    Process.flag(:trap_exit, true)
    Process.sleep(60_000) 
  end,
  shutdown: 20_000
)

Process.unlink(pid)

Process.sleep(100) # Make sure the task has time to set the flag

:timer.tc(fn -> 
  ref = Process.monitor(pid)
  Process.exit(pid, :normal)

  receive do 
    {:DOWN, ^ref, :process, _pid, _reason} -> :ok
  end
end)
# {20000947, :ok}
1 Like

I find it confusing what needs to trap exits with Process.flag(:trap_exit, true).

I would have guessed that Task.Supervisor does it on startup so that it can manage each child task’s :shutdown arg.

My second guess would have been the process that is calling Task.Supervisor.

I would not have guessed the right answer is in each task being spawned.

There’s a few steps here:

When elixir exits, then that will travel downward the supervision tree.

That happens because supervisors always trap exists (for multiple reasons). So when they get an exit signal they try to stop all their children. They do so by sending an exit signal to their children one-by-one, with an timeout before they brutal_kill them. That timeout is the shutdown you’re setting. But the time is all you set by that. Once all children are stopped the supervisor itself stops .

The child (could be any process, but in your case a task) still needs to decide how to react to exit signals. By default a task does not trap exists it seems, so it just stops when receiving an exit signal. If you want the task to continue and try to finish its workload before being killed by the supervisor you need to enable the :trap_exit flag.

I’m however with you that it’s strange that there’s a default shutdown timeout, when tasks don’t trap exists by default, so usually tasks would just stop immediately anyways.

4 Likes

Ahhh, ok, so both Task.Supervisor and the child tasks need to trap exits. Thanks for the explanation, makes sense now.

Yeah. Each process is responsible for it’s own behaviour. The supervisor just makes sure that all children are stopped (graceful at first, forceful afterwards). How the children manage being stopped is their responsibility.

2 Likes