Proper pattern for terminating children of a dynamic supervisor

autodidaddict · June 2, 2021, 4:37pm

I’ve reviewed the other topics here and done a bunch of googling and haven’t been able to find out the right way to deal with this.

I have DynamicSupervisors , about 4-5 of them. Each of those spawns GenServers as children via DynamicServer.start_child.

What I then need to be able to do is expose a function in the supervisor that will terminate a child by key, so it needs to: a) look up the child (this works fine), b) terminate the child.

The problem is I can’t figure out how to “cleanly” and “idiomatically” terminate the child process. In my terminate_child_by_key(foo) function, what’s the accepted way of shutting that pid down? Not only do I need to shut that pid down, but I need to be able to publish a message on my broker so that I can emit the “child died” event.

Since I’m going to do this over and over again, I want to do it right. I’ve tried using Process.flag(..) in the child as a way of getting advance notice that the child is going to die but that handle_info never gets called. When I use process flag in the supervisor, the supervisor never gets called during child death.

Additionally, for this scenario, I can’t use transient children (?) because they come back immediately after I kill them… the behavior I want is when I choose to kill the child, it’ll stay dead, but when it dies due to exception failure, it’ll restart.

Any advice would be greatly appreciated as I’ve been doing this all in a very ugly fashion and I don’t want to continue repeating that same ugliness all over my code base.

axelson · June 2, 2021, 11:37pm

Have you tried using DynamicSupervisor.terminate_child/2?

If you’re stopping a transient child cleanly then it should not be restarted.

ityonemo · June 2, 2021, 11:55pm

I think Genservers already have that process flag set and are designed to trap exit messages and handle them with the terminate/2 callback.

ityonemo · June 3, 2021, 12:00am

also I would say the idiomatic thing to do in elixir is to spin up a Registry and send a command to the GenServer for them to just self-destruct (something like def handle_call(:stop, _, state), do: {:stop, :normal, :ok, state}). Trawling through a supervisor’s child list seems like a very ungraceful thing to do.

As an aside, note that the reason for termination matters. IIRC you should supervise :transient but note that the “reason” for shutdown matters; :normal or :shutdown will not trigger restart, everything else (exceptions, kills, brutal_kills, custom reasons) will.

al2o3cr · June 3, 2021, 12:24am

If the process is actively handling messages (versus being blocked in a call or similar), consider adding an explicit “hey could you please shut down” message to the GenServer’s public API. The handle_call head for that can return {:stop, :shutdown, :ok, state} and the GenServer will exit with reason :shutdown.

Otherwise, signaling from outside is done via Process.exit/2 and functions built using it. It’s useful to understand exactly what “signaling a process to exit means”:

If any Erlang process gets an exit signal with a reason of :kill, it will exit immediately with the reason :kill.
If a process that isn’t trapping exits gets an exit signal, it will exit immediately with the same reason.
If a GenServer is trapping exits, the built-in handler from gen_server will invoke the terminate/2 callback with the reason. (more info)

(the above is summarized from The many and varied ways to kill an OTP Process | The furlough log of Paul Wilson )

The idiomatic sequence implemented by terminate_child in DynamicSupervisor and Supervisor is:

Process.exit(pid, :shutdown)
wait for the :DOWN message
if it doesn’t arrive (default in 5s), Process.exit(pid, :kill)

Prefer using those functions over building custom Process.exit setups unless you have a real good reason.

Re: reanimating processes - restart: :transient will restart the process if it exits with a reason other than :normal, :shutdown, or {:shutdown, term()} - for instance, if the process doesn’t respond to the :shutdown signal and gets killed. If that’s undesirable, consider restart: :temporary instead.

One final note: pay close attention to the gotchas listed in the terminate callback’s documentation. If you need a 100% reliable “GenServer went away” hook, consider using Process.monitor and handling the {:DOWN, ...} message.

autodidaddict · June 3, 2021, 12:56pm

Thanks for the tip. I’ll try and combine the use of DynamicSupervisor.terminate_child and a registry and see if that gives me the kind of cleanliness of code I’m looking for.

ityonemo · June 3, 2021, 1:39pm

Ah, just fyi you don’t have to combine them. If you use DynamicSupervisor.terminate_child, you don’t need to use Registry, and vice versa. I think there are two major differences:

Difference 1: naming

For DynamicSupervisor.terminate_child strategy, the process that is responsible for starting the process must be able to assign its id.
For Registry, you can lazily “figure out what my id is” at launch time.

Difference 2: performance.

Registry is going to be faster, because there isn’t a lookup that fetches information that has to be copied from another process (it uses ETS which is blazing fast, the only block is an internal mutex in C).

In either case, be mindful of the {:normal/:shutdown}/:kill/everything-else semantics with respect to restart logic.

autodidaddict · June 3, 2021, 2:07pm

Thanks for all the information here! I ended up deciding not to use Process.exit. Instead I followed @ityonemo 's suggestion of sending a “soft halt” by doing a GenServer.call(pid, :halt_and_cleanup) , this gives me the handle_call for this explicit type of termination, which in turn lets me publish the “child died” message on my broker and then safely return {:stop, :normal, :ok, state}.

I also adopted the use of the Registry and I’m now storing the child pids there along with their keys.

ityonemo · June 3, 2021, 2:41pm

Nice! I feel like this is the idiomatic Elixir solution.