Why doesn't a Supervisor worker gets auto-restarted after `Supervisor.terminate_child` is called on it?

I am writing my own job pool library lately and I’ve stumbled upon something I didn’t expect.

It sets up a typical Supervisor with several GenServer workers attached to it. Supervisor.which_children works fine, Process.registered shows both the supervisor and all workers and I can use Process.whereis and Process.info on their names. I can send messages to each worker and it responds fine.

Doing Process.exit(worker_pid, :kill) results in the supervisor restarting the worker as expected. (Both the supervisor and the workers have the restart: :permanent option explicitly specified, too.)

What does surprise me however is that calling Supervisor.terminate_child leaves the worker’s child spec in the supervisor (meaning Supervisor.which_children still shows it but with an :undefined PID)… and the worker process is NOT restarted. The docs of Supervisor.terminate_child don’t seem to address this. I’d expect the normal OTP auto-restart guarantees to apply.

Any clues or pointers? As a future library author I’d be worried that my users can just find the workers via Process.registered and stop them via Supervisor.terminate_child and then have their job pool execution code crash because one or more of the worker processes are not there.

I mean, if they want to use the library they should refrain from such shenanigans, obviously, but is there a way to make sure the workers are restarted even when stopped with Supervisor.terminate_child?

1 Like

If you explicitely ask the supervisor to terminate a child I find it logical that it does not restart it. You may rather use GenServer.stop or other means to tell the child to terminate, in which case the supervisor will do its job.

You can see the supervisor children specs as “the desired state of the application” and the supervisor job is to make sure that the actual state matches the desired state. Supervisor.terminate_child is a way to change the desired state.

4 Likes

Apparently you’re correct.

Oh well, protecting processes isn’t going as far as I wanted but it’s still covering 99% of what I need so I guess that’s that.