Log when child node exits or goes down

stevensonmt · January 8, 2022, 11:10pm

In the Elixir for Programmers course by @pragdave he asks the student to come up with a means of monitoring when children of a dynamic supervisor “goes away”. By “goes away” I assume that means exits gracefully or the node crashes out. I can easily monitor when nodes connect by implementing handle_info and when the the call to DynamicSupervisor.start_child occurs I follow it with a call to send(pid, DynamicSupervisor.which_children(my_supervisor_name)). I don’t yet have a graceful exit implemented for the game so right now I only have to handle when the node crashes out. Since this occurs without a message being sent to the supervisor I don’t know how to trigger a reaction to the event. Any suggestions?

===========
Actually it occurs to me that I might not have been killing or exiting the node in a reasonable way. I’m starting the supervisor in an IEX session, then in another terminal starting the child process. Then I ctrl-c to kill the IEX session of the child process. When running the observer on the supervisor I do not see any message coming into the supervisor, but the child process PID persists. So maybe as far as the VM is concerned nothing has happened?
The information in the observer for the child processes that are no longer active gives “status waiting” in Process Information tab but in the State tab “status” is “running” and Logged Events is an empty list.

joaoevangelista · January 9, 2022, 4:11am

Disclamer: I have no idea if it is a correct usage.

What you could do is monitor the child when you start it

{:ok, child} = DynamicSupervisor.start_child(MyDynSupervisor, child_spec)
Process.monitor(child)

Now when you kill the child the process that is monitoring, it will receive a {:DOWN, ref, :process, object, reason} message that you can act upon.
So to wrap this behaviour you could introduce a GenServer that delegates start child to the underlying supervisor, and monitors the children.

Example

defmodule MyListener do

use GenServer

  @doc false
  def start_link do
    GenServer.start_link(__MODULE__, nil, name: MyListener)
  end

  @impl true
  def init(_) do
    DynamicSupervisor.start_link(name: MySup, strategy: :one_for_one)
  end


  def start_child(server, child_spec) do
    GenServer.cast(server, {:start_child, child_spec})
  end


  def children(server) do
    GenServer.call(server, :children)
  end

  def terminate_child(server, child) do
    GenServer.cast(server, {:terminate, child})
  end

  @impl true
  def handle_cast({:start_child, child_spec}, dyn) do
    {:ok, child} = DynamicSupervisor.start_child(dyn, child_spec)
    Process.monitor(child)
    {:noreply, dyn}
  end

  def handle_cast({:terminate, child}, dyn) do
    DynamicSupervisor.terminate_child(dyn, child)
    {:noreply, dyn}
  end

  @impl true
  def handle_call(:children, _from, dyn) do
    children = DynamicSupervisor.which_children(dyn)
    {:reply, children, dyn}
  end

  @impl true
  def handle_info(msg, dyn) do
    IO.inspect(msg)
    {:noreply, dyn}
  end
end

{:ok, server} = MyListener.start_link()
MyListener.start_child(server, {Agent, fn -> %{} end})
pids = for {_, pid, _, _} <- MyListener.children(server), do: pid
MyListener.terminate_child(server, Enum.random(pids))

You can paste the example on a iex session

stevensonmt · January 9, 2022, 4:15am

Thanks but that doesn’t seem to work. The child process persists with no message sent to the supervisor.

When the observer is running and I kill the child-spawning IEX session, I see a “Node node_name down” message box, but the processes it was running do not change. I’d like my supervisor to be able to receive that “node down” message and respond to it.

joaoevangelista · January 9, 2022, 4:25am

The child will persist, since the supervisor is monitoring and restarting it when it dies.
The supervisor will not get a message that you can act upon, thats why we setup a monitor on another Process, to be notified when it dies.
To terminate a child, without the supervisor restarting it you need to call DynamicSupervisor.terminate_child/2 (see updated example above)
This will make that your GenServer gets a message on the handle_info callback that you can act upon

joaoevangelista · January 9, 2022, 4:26am

Nodes are VMs, if you start two iex sessions that is two nodes (vms)

stevensonmt · January 9, 2022, 4:33am

Sorry for being so slow to get it. Thank you for your help. I understand the two nodes are separate VMs, I was simply pointing out that the VM running the parent process was getting some sort of message when the VM running the child process went down. If the observer can see that message I assume there is a way to read that message for my application as well.
I also appreciate you pointing out terminate_child/2 but my problem is figuring out how to trigger calling that.

stevensonmt · January 9, 2022, 4:44am

Actually I think I just realized what is wrong with my thinking. Both processes are running on the same VM. The client node can disconnect but the game process persists.

I think I need to send the pid of the client node process that calls for the game to start.
Right now the client does this to start a new game:

def connect() do 
  :rpc.call(@my_game_server, MyGame, :new_game, [])
end

and then MyGame.new_game/0 calls the server’s start_game/0 function that is under the supervisor.
I think I need MyGame.new_game/1 that takes a pid from the client via :rpc.call(@my_game_server, MyGame, :new_game, [self()]) and then I can have Server.start_game/1 that takes that pid and calls Process.monitor(pid).

Maybe.

joaoevangelista · January 9, 2022, 4:52am

Oh so you are also dealing with distribution, you could start the game on one node and call functions of it using rpc, yes. If you need to see if the client node disconnects from the game node, you’ll need to use Node.monitor/2. I thought you wanted to log when a child of the dynamic supervisor exited.

stevensonmt · January 9, 2022, 4:53am

Yes, I was confused and asked the wrong question.

stevensonmt · January 9, 2022, 5:15am

Is there a way to tell the Supervisor to monitor the Node? When calling Node.monitor/2 in my start_game/0 function I don’t think the Supervisor is monitoring but rather the parent of the Supervisor. How do I find which process is doing the monitoring?

joaoevangelista · January 9, 2022, 4:38pm

You can use a genserver to monitor the Node, the process where you call Node.monitor/2 will be the one responsible to monitoring it. If you call on a GenServer callback it will be the GenServer, then you can listen for messages on handle_info callback.

stevensonmt · January 9, 2022, 5:05pm

That’s what I’ve just done this morning. I had to set up a “NodeWatcher” GenServer that is started when the application starts. When a new game is started I check to see if there are any new nodes connected and if so add them to the NodeWatcher. The NodeWatcher runs Node.monitor/2 every time a node is added to its state. When the NodeWatcher receives a {:nodedown, node} message I have a handle_info call that removes the node from the watcher’s state and outputs confirmation that it has been removed to the terminal. Ultimately I think I’d like to have the ability to keep games associated with nodes, and if a given node goes down it starts a timer for the game to shutdown if the node does not reconnect in x amount of time.