Can you provide a link to your talk? I would really like to hear your opinion on this!
It isn’t always practical to introduce a delay, but in cases where it is, I’ve used the following trick to allow a failing worker to “keep trying indefinitely” without hitting max restart intensity. In my worker, I use Process.send_after
or :timer.sleep()
to introduce a delay before executing the code that might fail. If the delay is greater than the max_seconds
option you passed to Supervisor.start_link/2
, then even if the worker fails repeatedly, it won’t fail frequently enough to exceed max restart intensity. It’s not elegant, but it is simple. Obviously this only suits certain cases, often it won’t be acceptable to introduce delays.
There is also supervisor3 from Klarna (based on RabbitMQ’s version) if you are willing to use Erlang.
supervisor3
is capable to do delayed infinite restarts.
From the README: "Child specifications can contain, as the restart type, a tuple {permanent, Delay}
"
For example the Kafka client brod
uses it here.
Save a ton of my times. Thank you very much for this hack.
@brucepomeroy