Restarting a task that can be awaited on

fireproofsocks · July 20, 2024, 8:34pm

I’m starting to get deeper into the subtleties of supervisors and I could use some help understanding how and when to restart failed tasks.

Given I have a risky_function() that may raise errors, I can ensure that it eventually completes by supervising a task that calls it by doing something like this (assume MyTaskSupervisor is started):

Task.Supervisor.start_child(MyTaskSupervisor, fn -> 
  risky_work() end, 
  restart: :transient)

Even if the risky_work function hits errors, it restarts and it eventually completes successfully.

This is (I think) an example of a non-awaited task. How would this need to be structured if I wanted that same risky_work to be part of an awaited task?

The following attempt doesn’t ensure that the task completes:

task = Task.Supervisor.async_nolink(MyTaskSupervisor, fn ->
  risky_work()
end)
Task.await(task) # or Task.yield(task)

The risky_work function is called immediately and any errors it raises are immediately visible. The docs show how Task.Supervisor.async_nolink/3 might get used inside of a GenServer… I can get that example working, but it still doesn’t ensure that the risky_work ever completes. I can see where I can restart the task – there’s even a comment in the example # Log and possibly restart the task..., but if I am restarting the task, I feel like I’m doing something wrong. Isn’t that what the supervisor is supposed to do?

Thanks for any clarifications! I feel like maybe I’m thinking about this the wrong way.

al2o3cr · July 20, 2024, 10:03pm

The way that an await-able Task gets its arguments isn’t compatible with being restarted by a standard Supervisor - the Task process is started up, and then the MFA and alias to use are sent to it:

github.com

elixir-lang/elixir/blob/v1.17.2/lib/elixir/lib/task/supervisor.ex#L599-L603


      
          {:ok, pid} ->
            if link_type == :link, do: Process.link(pid)
            alias = :erlang.monitor(:process, pid, alias: :demonitor)
            send(pid, {owner, alias, alias, get_callers(owner), {module, fun, args}})
            %Task{pid: pid, ref: alias, owner: owner, mfa: {module, fun, length(args)}}

github.com

elixir-lang/elixir/blob/47abe2d107e654ccede845356773bcf6e11ef7cb/lib/elixir/lib/task.ex#L512-L516


      
          {:ok, pid} = Task.Supervised.start_link(get_owner(owner), :nomonitor)
          
          alias = build_alias(pid)
          send(pid, {owner, alias, alias, get_callers(owner), mfargs})
          %Task{pid: pid, ref: alias, owner: owner, mfa: {module, function_name, length(args)}}

So “restarting” the task process will leave it waiting for that initial tuple message forever.

This is obliquely referenced in the documentation for Task.Supervisor.async_nolink/3:

Note this function requires the task supervisor to have :temporary as the :restart option (the default), as async_nolink/3 keeps a direct reference to the task which is lost if the task is restarted.

fireproofsocks · July 22, 2024, 3:08pm

Interesting. Thanks!