I’ve reviewed the other topics here and done a bunch of googling and haven’t been able to find out the right way to deal with this.
I have DynamicSupervisor
s , about 4-5 of them. Each of those spawns GenServer
s as children via DynamicServer.start_child
.
What I then need to be able to do is expose a function in the supervisor that will terminate a child by key, so it needs to: a) look up the child (this works fine), b) terminate the child.
The problem is I can’t figure out how to “cleanly” and “idiomatically” terminate the child process. In my terminate_child_by_key(foo)
function, what’s the accepted way of shutting that pid down? Not only do I need to shut that pid down, but I need to be able to publish a message on my broker so that I can emit the “child died” event.
Since I’m going to do this over and over again, I want to do it right. I’ve tried using Process.flag(..)
in the child as a way of getting advance notice that the child is going to die but that handle_info
never gets called. When I use process flag in the supervisor, the supervisor never gets called during child death.
Additionally, for this scenario, I can’t use transient children (?) because they come back immediately after I kill them… the behavior I want is when I choose to kill the child, it’ll stay dead, but when it dies due to exception failure, it’ll restart.
Any advice would be greatly appreciated as I’ve been doing this all in a very ugly fashion and I don’t want to continue repeating that same ugliness all over my code base.
Have you tried using DynamicSupervisor.terminate_child/2
?
If you’re stopping a transient
child cleanly then it should not be restarted.
2 Likes
I think Genservers already have that process flag set and are designed to trap exit messages and handle them with the terminate/2 callback.
also I would say the idiomatic thing to do in elixir is to spin up a Registry and send a command to the GenServer for them to just self-destruct (something like def handle_call(:stop, _, state), do: {:stop, :normal, :ok, state}
). Trawling through a supervisor’s child list seems like a very ungraceful thing to do.
As an aside, note that the reason for termination matters. IIRC you should supervise :transient
but note that the “reason” for shutdown matters; :normal
or :shutdown
will not trigger restart, everything else (exceptions, kills, brutal_kills, custom reasons) will.
2 Likes
If the process is actively handling messages (versus being blocked in a call
or similar), consider adding an explicit “hey could you please shut down” message to the GenServer’s public API. The handle_call
head for that can return {:stop, :shutdown, :ok, state}
and the GenServer will exit with reason :shutdown
.
Otherwise, signaling from outside is done via Process.exit/2
and functions built using it. It’s useful to understand exactly what “signaling a process to exit means”:
-
If any Erlang process gets an exit signal with a reason of :kill
, it will exit immediately with the reason :kill
.
-
If a process that isn’t trapping exits gets an exit signal, it will exit immediately with the same reason.
-
If a GenServer is trapping exits, the built-in handler from gen_server
will invoke the terminate/2
callback with the reason. (more info)
(the above is summarized from The many and varied ways to kill an OTP Process | The furlough log of Paul Wilson )
The idiomatic sequence implemented by terminate_child
in DynamicSupervisor
and Supervisor
is:
Process.exit(pid, :shutdown)
- wait for the
:DOWN
message
- if it doesn’t arrive (default in 5s),
Process.exit(pid, :kill)
Prefer using those functions over building custom Process.exit
setups unless you have a real good reason.
Re: reanimating processes - restart: :transient
will restart the process if it exits with a reason other than :normal
, :shutdown
, or {:shutdown, term()}
- for instance, if the process doesn’t respond to the :shutdown
signal and gets killed. If that’s undesirable, consider restart: :temporary
instead.
One final note: pay close attention to the gotchas listed in the terminate
callback’s documentation. If you need a 100% reliable “GenServer went away” hook, consider using Process.monitor
and handling the {:DOWN, ...}
message.
3 Likes
Thanks for the tip. I’ll try and combine the use of DynamicSupervisor.terminate_child
and a registry and see if that gives me the kind of cleanliness of code I’m looking for.
Ah, just fyi you don’t have to combine them. If you use DynamicSupervisor.terminate_child, you don’t need to use Registry, and vice versa. I think there are two major differences:
- Difference 1: naming
- For DynamicSupervisor.terminate_child strategy, the process that is responsible for starting the process must be able to assign its id.
- For Registry, you can lazily “figure out what my id is” at launch time.
- Difference 2: performance.
- Registry is going to be faster, because there isn’t a lookup that fetches information that has to be copied from another process (it uses ETS which is blazing fast, the only block is an internal mutex in C).
In either case, be mindful of the {:normal/:shutdown}/:kill/everything-else semantics with respect to restart logic.
2 Likes
Thanks for all the information here! I ended up deciding not to use Process.exit
. Instead I followed @ityonemo 's suggestion of sending a “soft halt” by doing a GenServer.call(pid, :halt_and_cleanup)
, this gives me the handle_call
for this explicit type of termination, which in turn lets me publish the “child died” message on my broker and then safely return {:stop, :normal, :ok, state}
.
I also adopted the use of the Registry and I’m now storing the child pids there along with their keys.
4 Likes
Nice! I feel like this is the idiomatic Elixir solution.
1 Like