Hi folks,
I want to gracefully shutdown a child of a dynamic supervisor (the parent), where the child is a continuously running state machine which uses the gen_statem behavior; also, the child is trapping exits to catch signals from its parent. The parent uses the following to issue a signal to the child: Process.exit(pid, :normal)
I expected the child to terminate; this was not the case and it continues to run code.
From reading the following links (link_1 , link_2 and link_3) It seems that Process.exit(pid, :normal) does not/should not/can not work and I should use :kill as the reason. The documentation is confusing for my case. Any help is much appreciated.
The only way for an external process to stop a exit trapping process (ungracefully) is to use :kill reason. Otherwise the signal will be converted to a message for the exit trapping process to handle. It can decide if it shuts down or not based on the message.
Hello @LostKobrakai
Thank your for reaching out. You pointed to the crux of the problem. Why does the child ignore/drop the exit signal with :nromal from its parent? Do you have any suggestions for debugging the issue? For my use case, it’s better for the child process to go down gracefully.
I have not been able to reach a meaningful conclusion after reading the docs.
Hi @hst337
Thanks for reaching out. This is where I send the exit signal to the child from the dynamic supervisor. It’s part of a function which issues the terminate signal to the child process and removes artifacts related to the child:
def clean_up(gv_spec) do
# cleanup
case :pg.get_members(gv_spec.id) do
[] ->
Logger.info(%{msg: "there is no gv for given instance", id: gv_spec.id})
[pid | _] ->
Process.exit(pid, :normal)
end
# cleanup continued
end
terminate is only called when the process actually shuts down. The exit message would be handled in a handle_event with event type :info (handle_info elsewhere).
Thanks for the feedback @LostKobrakai. My goal was to demonstrate that the terminate function has been implemented. As you pointed out, since the process does not terminate as expected, it doesn’t get called. To address your original point, a handle_event with event type :info is present in the code base.
This code is called not from dynamic supervisor process, I can assure you. Perhaps this is called from dynamic supervisor module, but not process. There is actually no non-hacky way to call anything from any supervisor process (unless it is a supervisor written from scratch).
Considering the trap_exit.
Any OTP compliant process with trap_exit behaves this way:
If exit signal is received from the parent process (this can be checked in Process.get()), it is handled in a way if the child was not trapping exits. And, if the child receives exit signal from any non-parent process, it is handled as a message in handle_info (or in handle_event or state function in case of gen_statem)
Mix.install [:gen_state_machine]
defmodule Server do
use GenStateMachine, callback_mode: :state_functions
def start_link(opts \\ []) do
GenStateMachine.start_link(__MODULE__, opts)
end
def init(opts) do
Process.flag(:trap_exit, true)
{:ok, :state, opts}
end
def state(:info, message, data) do
IO.inspect(message, label: :received)
:keep_state_and_data
end
def terminate(reason, :state, data) do
IO.inspect(reason, label: :terminating)
end
end
DynamicSupervisor.start_link(name: Sup)
{:ok, child} = DynamicSupervisor.start_child(Sup, Server)
Hello again @hst337
Thank you very much for your guidance. I’ve encountered an interesting case and it would be great if I could have your feedback.
While reading the logs, it came to my attention that a child was terminated 10 hours after terminate_child/2 was issued. The child in question, is terminated from an external process other than the parent; it uses the gen_statem behavior and has two timeout events with a resolution of 10 minutes. In these timeout events, it opens an external file (therefore using an external resource).
Is termination postponed while the child is reading an external file when using terminte_child/2?
Or if termination is issued during a timeout? 10 hours is a huuuge gap and I don’t understand what could be the problem.
No, the termination is not postponed in any case. You need to take in account, that messages received before termination, are processed before terminate callback is called.
If you want to kill your child in an instant, you should delete child and kill with :kill reason. In this case, it won’t be possible to run terminate callback
Anyway, this is XY problem, since you’re trying to do resource management relying on terminate callback.
You need to know 2 things
terminate callback is for optimistic cleanup. This means that the callbacks is called only when exception can be handled. Some cases like hardware failure, out-of-memory errors and infinite loops are not covered by terminate callbacks. That’s why you should not rely on terminate callback
The idiomatic way to resource management is a resource pool or observer pattern. Latter is much more easier to implement, and it is basically a separate process, which hosts the resource and monitors the process using the resource. In this case, when the user of the resource dies from any reason or stucks in infinite loop, your observing process will be able to close or cleanup the resource