I’ve encountered a
Supervisor behaviour I find surprising.
In a few words, a library I’m working on had an issue where the entire app would shut down after too many errors in a process. All the processes are in a supervision tree with multiple nested supervisors,
strategy: :one_for_one and
I appreciate that it might be the sensible thing to do, but I’m having troubles finding documentation on this.
My specific problem was that I had forgotten to declare a
handle_info clause in a genserver. I’m using Redis pub-sub and I was testing what happens when the Redis server is terminated while the mix app is still running.
The library I’m using,
redix_pub_sub, requires a running genserver to maintain the connection with Redis, which will receive a
:disconnect message if something interrupts the connection. Since I wasn’t handling it, stopping Redis caused a stream of errors.
The process would crash with a
FunctionClauseError for the missing
handle_info, then be restarted as expected, and then crash again for the same reason. I could verify that it would be restarted three times before a complete system failure.
After three restarts and errors, the application would give up and terminate with this message printed on the console:
[info] Application my_app_name exited: shutdown
This would affect that specific supervision tree only, and not the rest. If run inside
iex -S mix, for example, I could restart it with
Implementing the missing function clause solves the immediate problem, of course, but I am wondering if I can control the “shutdown everything” behaviour.
I can reproduce the issue by triggering other errors in quick sequence.
For example adding a
1 / 0 in function that I can trigger manually is “tolerated” and the supervisors do their job. Adding it in a callback that is invoked multiple times when a process starts, on the other hand, will take everything down.
As I said, this might be the most sensible thing to do, but I’d like to learn more.