Supervisors - Delayed Restart

Hey there,

Is there a way to configure a Supervisor such that when a child terminates, it will wait a random amount of time before restarting it?

I am trying to make a realistic simulation in which GenServers randomly fail and come back. At the moment, restarts always seem to be instant but in a real world this is not always the case. One thought was to add a random sleep in the start_link function of my GenServers. However, this would make my whole Supervisor sleep (synchronous), which is also undesirable.

I’m looking forward to your insights!

If you need delays between restarts I’d suggest managing that separate to the supervision tree/restarts of supervisors.

Could you ellaborate?

Supervisor restarts are meant to keep your system in a working state, not in a partially down state. A parent supervisor does attempt to restart the child, but that’s the only mechanism it has to keep the system up. If restarts don’t help the failure will propagate up the supervision tree.

This is not a tool to handle temp. outages, where waiting would also be an acceptable attempt at restoring working state. You’d want to encode handling such failures into your own business logic / use alternative supervisor implementations.

Built in supervisor? No, but there are other implementations that allow for that:

1 Like

You can add a handle_continue clause in your genserver and sleep there
Something like

def init() do
  ...
  {:ok, state, {:continue, :random_sleep}}
end

def handle_continue(:random_sleep, state) do
  Process.sleep(:rand.uniform(1, 112345))
  {:noreply, state}
end

This won’t block the supervisor
And yeah, in the real world restart is kinda instant too