DynamicSupervisor vs Supervisor, different behaviour with .terminate_child

Hi, just a quick question here about the use of DynamicSupervisor.
I have been using Supervisor for managing child GenServer procs, but due to being advised to change to DynamicSupervisor to manually start/stop child procs I decided to go ahead with it and follow the instructions. Everything seems fine full steam ahead, but what I am finding is upon calling a function that calls terminate_child with a pid calculated from GenServer.whereis and a “via tuple”, if a nil value is used as the pid value, then DynamicSupervisor signals with:
** (FunctionClauseError) no function clause matching in DynamicSupervisor.terminate_child/2
The following arguments were given to DynamicSupervisor.terminate_child/2:

1

IslandsEngine.GameSupervisor

2

nil
Attempted function clauses (showing 1 out of 1):
def terminate_child(supervisor, pid) when -is_pid(pid)-

The behaviour in Supervisor was to return an error tuple
{:error, :not_found}. It also appears that the function in DynamicSupervisor explicitly checks for a valid pid, so I’m guessing that Supervisor did not do this? Or is there something I’m missing here? Well, I guess trying to repeatedly stop an already stopped process is not something one wants to do, however, it would be useful to know how to do this safely without changing the calling code too much. Many thanks.

1 Like

Just to add that, I was finding that the error signal was causing the supervisor proc to bring back a dead child, then it was stopping and then restarting it again every two calls to terminate. Well I managed to stop the fun by wrapping the original call to Supervisor:
Supervisor.terminate_child(MODULE, pid_from_name(player1_name))

in the following:

case pid_from_name(player1_name) do
nil -> {:error, :not_found}
pid -> DynamicSupervisor.terminate_child(MODULE, pid)
end

to keep the behaviour the same as before, which seems to have settled things down again.

Having read more about Elixir processes, it appears that the last update might not mitigate the situation, since it could introduce a ‘race condition’ - checking a pid before executing the DynamicSupervisor.terminate call - the pid could go to nil again before the call and so back to the original situation, so I’m not sure what the safe alternative would be. Any suggestions/advice very much welcomed.

If you have a nil then just do not call terminate_child since your process is not alive. Isn’t it enough ? Or you can use GenServer.stop !

Thanks for your suggestion, but it doesn’t help sorry.

In answer to your suggestion, “if I have a nil then …”
This represents 2 statements. I need a single atomic statement.

Because, …
Case 1: I check pid and see a nil. So I don’t call terminate_child.
However, in the next instant pid might not be nil, so I lost an opportunity!
Case 2: I check pid and see something like #PID<0.505.0>. So I call terminate_child. However, pid might now be nil, so I’m falling over a cliff!

Going back to the original problem, what I’ve found is that, although I’m using DynamicSupervisor I can still call Supervisor.terminate_child and that accepts nil no problem (I guess it must be an Elixir macro thing with the original
use DynamicSupervisor no?). In this case I get back a tuple I can work with {:error, :not_found}, so maybe I’ll continue with this for
a while until the DynamicSupervisor.terminate_child thing is better understood …

Calling GenServer.stop(pid) won’t work either I’m afraid, since it crashes the client because the pid is
nil:
** (exit) exited in: GenServer.stop(nil, :normal, :infinity)
** (EXIT) no process: the process is not alive or there’s no process currently associated with the given name, possibly because its application isn’t started
(elixir 1.10.2) lib/gen_server.ex:971: GenServer.stop/3

Maybe I could do a try catch on this, and see if the signal can be converted into a conventional Elixir structure,
thanks for the suggestion, I’ll investigate this further.

Best.

Why this requirement ? that seems absurd, sorry.

From what you describe, even if you have an atomic way of terminating your process, it can be not-nil the instant after. If the process can be started from elsewhere at any moment, whatever way you chose to terminate it will not change that fact. And if you have a nil, and one instant after the process is alive, then it is for a reason, so why would you want to stop it ? Just do not start it, no ?

So in the end that boils down to the following situation:

  • either you have a process pid and you can stop that process
  • or you don’t and there is not much you can do.

The solution is not atomic functions, but idempotency. Calling terminate_child for a pid will always make sure the pid won’t be left alive.

You have the same problem when calling terminate_child directly. At a moment the pid doesn’t exist, but a moment later a process is spawned with that pid. Also I’m not sure if that’s actually a problem given the reuse strategy for pids.

A returned pid #PID<0.505.0> cannot magically become nil. It’ll stay being a pid even if one for a process no longer alive. So you can call terminate_child everytime you get a pid:

case GenServer.whereis(name) do
  pid when is_pid(pid) -> DynamicSupervisor.terminate_child(supervisor, pid)
  nil -> {:error, :not_found}
end

The DynamicSupervisor.terminate_child is different to Supervisor.terminate_child because the Supervisor one is not actually meant to be called with an pid, but with it’s child id for the supervisor. Calling it with an PID will be deprecated with elixir 1.11 based on a comment in the source code.

1 Like

Nice guys. These are wonderful comments (lud- yes I agree with you, it would be absurd, but there are mitigating circumstances, please bear with me …).

LostKobrakai: You make a great point. What you are saying is: “once a pid, always a pid”, or in my speak, an Elixir pid leaves a footprint - whether it’s dead or alive a pid will always have a representation as a pid (and not nil). Having read the docs for Supervisor.terminate_child and DynamicSupervisor.terminate_child a bit more I see that, as per your comments, whereas DynamicSupervisor demands a pid to effect the match, Supervisor is a bit more lenient, admitting an “ID”. I like your recommended snippet:

case GenServer.whereis(name) do
pid when is_pid(pid) -> DynamicSupervisor.terminate_child(supervisor, pid)
nil -> {:error, :not_found}
end

I had the following:

case Game.pid_from_name(name) do # a GenServer.whereis call from a via tuple
nil -> {:error, :not_found}
pid -> DynamicSupervisor.terminate_child(supervisor, pid)
end

Your code looks more precise. The only question I would have is whether there can be a third option, covered by a final _ case match variable, or that this represents all possible outcomes.

So, that just leaves the question as to why I was a bit worried in the first place. Well, I’m calling a supervisor process that manages a sole child proc, and it’s module just has 2 functions - to start the and stop the child that manages a game between 2 players. The code is taken from a book on functional programming and web design for the Phoenix programming language. In the book there is a GameSupervisor proc that starts a “game” process created from an Elixir via tuple based on an arbitrary name. The rules for calling “start_game” and “stop_game” is left open for the reader to implement. Starting a game process associates an arbitrary name with an Elixir process, while stopping a game process removes the entry from the Elixir Registry (hence nil value is possible on lookup).

Check the docs – in this case the typespec – of GenServer.whereis:

iex(1)> h GenServer.whereis

                              def whereis(server)

  @spec whereis(server()) :: pid() | {atom(), node()} | nil

#...the rest follows

That gives you the necessary information to make an exhaustive pattern matching.

Thank you.