Restart: :temporary children don't trigger sibling restart on failure

david_ex · April 23, 2018, 7:24am

I’d like to confirm my understanding of restart: :temporary workers and their use cases as I was unable to find anything about this (whether in the docs, or the web).

Let’s say I have a supervisor Sup with 2 children:

Temp which has a :temporary restart value (Supervisor — Elixir v1.16.0) Note that despite the chosen restart strategy, this process is intended to always be alive.
Server which has the default restart value (i.e. :permanent)

Per the docs, temporary processes are never restarted: if I kill Temp (e.g. in the Observer) it won’t get restarted. However, killing Temp doesn’t trigger Sup to kill and restart Server even though the strategy is :one_for_all. The docs (Supervisor — Elixir v1.16.0) say that

if a child process terminates, all other child processes are terminated and then all child processes (including the terminated one) are restarted

Per the above, my expectation would be that although no temporary processes would get restarted, killing a temporary child process would still trigger the others to restart. Shouldn’t the docs instead say

if a (non-:temporary) process terminates, …

Or is this common knowledge/self-evident?

What’s the goal behind the above configuration? To implement the service/worker pattern discussed e.g. in The basic Erlang service ⇒ worker pattern – The Intellectual Wilderness where Sup would start only Server, after which Server would start Temp when initializing. If Temp dies, Server should too, and Sup should restart only Server which in turn will start Temp. (To be clear, although Server is the one triggering Temp to start, it does not supervise it: both Server and Temp are supervised by Sup.)

Since :temporary processes within a supervision tree never get restarted, and don’t trigger sibling restarts on failure, am I correct in assuming that to achieve the above, I need to trap exits within Server and link Server to Temp?

For my own edification, are there use cases for :temporary workers that aren’t linked or monitored? In other words, besides use cases similar to the above, when would you use a :temporary restart value instead of :transient?

OvermindDL1 · April 23, 2018, 9:42pm

I use :temporary for processes that should perform work then die, even if on failure, but should still be introspectable by the OTP system so I can see if something is running away.

I’m not sure what elixir is doing saying that if it is, but at least in Erlang a temporary child process is never restarted (not even when the supervisor restart strategy is rest_for_one or one_for_all and a sibling death causes the temporary process to be terminated). This is from:
https://erlang.org/doc/designprinciples/sup_princ.html

david_ex · April 24, 2018, 8:02am

But if the process is :temporary you have no guarantee (out of the box, i.e. without monitoring, etc.) that the work gets done: the process could die before having completed the work. Do you by any chance have a practical example where you don’t care about that? Is it just for “nice to have”-level stuff (e.g. some tracking metric that is nice when available but not worth recomputing on failure)?

Regarding :temporary process restarts, my question is actually about “the other way around” than the one you mention above. Let’s say we have a supervisor S with a :one_for_all strategy. S has 2 children: a :permanent process P, and a :temporary process T.

It is clear to me killing process P will bring down T (and that T will not be restarted). However, it seems like killing T will never cause P to get restarted. But the Erlang docs state (from the same Erlang -- Supervisor Behaviour)

If a child process terminates, all other child processes are terminated

But that’s clearly not the case: in the example above, T is a child process that gets killed (and therefore terminates), yet all other child processes were NOT terminated since P remained alive.

I just wanted to verify I wasn’t missing something completely obvious. I’ve tried to clarify the (Elixir) docs with respect to this situation: clarify behavior of `:temporary` processes within supervisor strategies by davidsulc · Pull Request #7589 · elixir-lang/elixir · GitHub