I am writing my own job pool library lately and I’ve stumbled upon something I didn’t expect.
It sets up a typical Supervisor
with several GenServer
workers attached to it. Supervisor.which_children
works fine, Process.registered
shows both the supervisor and all workers and I can use Process.whereis
and Process.info
on their names. I can send messages to each worker and it responds fine.
Doing Process.exit(worker_pid, :kill)
results in the supervisor restarting the worker as expected. (Both the supervisor and the workers have the restart: :permanent
option explicitly specified, too.)
What does surprise me however is that calling Supervisor.terminate_child
leaves the worker’s child spec in the supervisor (meaning Supervisor.which_children
still shows it but with an :undefined
PID)… and the worker process is NOT restarted. The docs of Supervisor.terminate_child
don’t seem to address this. I’d expect the normal OTP auto-restart guarantees to apply.
Any clues or pointers? As a future library author I’d be worried that my users can just find the workers via Process.registered
and stop them via Supervisor.terminate_child
and then have their job pool execution code crash because one or more of the worker processes are not there.
I mean, if they want to use the library they should refrain from such shenanigans, obviously, but is there a way to make sure the workers are restarted even when stopped with Supervisor.terminate_child
?