Is there a way to configure a Supervisor such that when a child terminates, it will wait a random amount of time before restarting it?
I am trying to make a realistic simulation in which GenServers randomly fail and come back. At the moment, restarts always seem to be instant but in a real world this is not always the case. One thought was to add a random sleep in the start_link function of my GenServers. However, this would make my whole Supervisor sleep (synchronous), which is also undesirable.
Supervisor restarts are meant to keep your system in a working state, not in a partially down state. A parent supervisor does attempt to restart the child, but that’s the only mechanism it has to keep the system up. If restarts don’t help the failure will propagate up the supervision tree.
This is not a tool to handle temp. outages, where waiting would also be an acceptable attempt at restoring working state. You’d want to encode handling such failures into your own business logic / use alternative supervisor implementations.
You can add a handle_continue clause in your genserver and sleep there
Something like
def init() do
...
{:ok, state, {:continue, :random_sleep}}
end
def handle_continue(:random_sleep, state) do
Process.sleep(:rand.uniform(1, 112345))
{:noreply, state}
end
This won’t block the supervisor
And yeah, in the real world restart is kinda instant too