I am writing my own job pool library lately and I’ve stumbled upon something I didn’t expect.
It sets up a typical
Supervisor with several
GenServer workers attached to it.
Supervisor.which_children works fine,
Process.registered shows both the supervisor and all workers and I can use
Process.info on their names. I can send messages to each worker and it responds fine.
Process.exit(worker_pid, :kill) results in the supervisor restarting the worker as expected. (Both the supervisor and the workers have the
restart: :permanent option explicitly specified, too.)
What does surprise me however is that calling
Supervisor.terminate_child leaves the worker’s child spec in the supervisor (meaning
Supervisor.which_children still shows it but with an
:undefined PID)… and the worker process is NOT restarted. The docs of
Supervisor.terminate_child don’t seem to address this. I’d expect the normal OTP auto-restart guarantees to apply.
Any clues or pointers? As a future library author I’d be worried that my users can just find the workers via
Process.registered and stop them via
Supervisor.terminate_child and then have their job pool execution code crash because one or more of the worker processes are not there.
I mean, if they want to use the library they should refrain from such shenanigans, obviously, but is there a way to make sure the workers are restarted even when stopped with