I am writing my own job pool library lately and I’ve stumbled upon something I didn’t expect.
It sets up a typical Supervisor with several GenServer workers attached to it. Supervisor.which_children works fine, Process.registered shows both the supervisor and all workers and I can use Process.whereis and Process.info on their names. I can send messages to each worker and it responds fine.
Doing Process.exit(worker_pid, :kill) results in the supervisor restarting the worker as expected. (Both the supervisor and the workers have the restart: :permanent option explicitly specified, too.)
What does surprise me however is that calling Supervisor.terminate_child leaves the worker’s child spec in the supervisor (meaning Supervisor.which_children still shows it but with an :undefined PID)… and the worker process is NOT restarted. The docs of Supervisor.terminate_child don’t seem to address this. I’d expect the normal OTP auto-restart guarantees to apply.
Any clues or pointers? As a future library author I’d be worried that my users can just find the workers via Process.registered and stop them via Supervisor.terminate_child and then have their job pool execution code crash because one or more of the worker processes are not there.
I mean, if they want to use the library they should refrain from such shenanigans, obviously, but is there a way to make sure the workers are restarted even when stopped with Supervisor.terminate_child?






















