Retrieving state from exited process

I’m playing around with processes and I’m wondering if I’m barking up the wrong tree.

This is purely an academic exercise at this point, so feel free to criticize the approach (I’m just bumbling through things as I try to figure them out). I have a simple game-like simulation that uses multiple GenServers to store state and do a few simple things concurrently – think of it like the Hunger Games where each process is competing for survival. Based on the rules of the “game”, some processes will be killed, and the survivors will go on to the next round. Wash, rinse, repeat…

At the end of the game, I want to be able to retrieve the state from each process. This is easy with the processes that are still alive by the end because I can write a handle_call/3 function or something similar that will return the current state of the process.

However, I’ve been struggling with how to get the status of the processes that have been forcefully exited during the course of the game. How can I get the state of those processes (as well as the reason for their exit)?

In my MyGenserver module, I have set the following flags:

# GenServer callbacks:
def init(opts) do
  Process.flag(:trap_exit, true)
  {:ok, %{example: "state"}}
end

def terminate(reason, state) do
  Map.put(state, :status, reason)
end

And then I have been playing around with something like this:

{:ok, pid} = MyGenserver.start_link([])
# update state... do stuff ...
Process.exit(pid, :disqualified)

receive do
  y -> IO.inspect(y, label: "RECEIVED....")
end

When I have the receive there, I do get some info that’s useful:

22:42:59.182 [error] GenServer #PID<0.244.0> terminating
** (stop) :disqualified
Last message: {:EXIT, #PID<0.92.0>, :disqualified}
State: %{example: "state"}
** (EXIT from #PID<0.92.0>) :disqualified

But I don’t follow exactly why I’m able to see that or why the receive block is working because I am unable to inspect anything there.

My hope was that at the end of the game, I could exit all the competing processes (some with :normal, others with :disqualified etc) and then use the same method to receive the last known state of each process to report back on what each one had done.

Can someone point me in the right direction? Thanks!

  1. log everything you care about (logs have structured data now) and write a logger handler that logs to file, recover after.

Or

  1. use ets tables. Maybe you can write to them by implementing a terminate/2 hook
2 Likes

When a process dies ALL its data is lost. The process heap, message queue, process dictionary and stack is lost. There is no way around this. Will this change in the future: no!

This means you have to explicitly save all information you may want when the process terminates.

You can’t rely on the terminate/2 callback as this is only called when the process terminates cleanly, i.e. when it returns a stop tuple from its callbacks, or when it is trapping exits and receives an exit signal from its parent process. It is not called when the process crashes.

3 Likes

Given you’re working on a game I’m wondering if existing is a good way for modeling :disqualified. Maybe you can mark processes as :disqualified within your game loop, then after the iteration check all states of your proccesses and after that let all processes, which are disqualified exit. Basically separate the game state from process lifecycles.

2 Likes

You’re presumably using a single “parent” process to supervise all the competing processes in the game - I recommend having that trap exits and handle the :DOWN messages that arrive when processes exit.

1 Like

There’s two basic ways of following the lifecycle of a process from outside of it.

  1. Linking to the process we want to follow
  2. Monitoring the process we want to follow

1 - Is useful when you want exits to be linked, so:
a) Parent Process Exits when Child Process Exits and vice-versa;

or

b) when you want to follow the child process exit without exiting (by trapping exits on the parent) but make the child process exit automatically if and when the parent dies (to not leave lingering child processes that only make sense in the context of the parent in case the parent itself exits for some reason - I don’t think the opposite in this case is useful or have ever run into a situation that would need it but you could certainly trap exits in the childs and not the parent to have the opposite)

2 - is useful when

a) you just want to monitor the child process and the child process is known to always exit (meaning you don’t need to worry about lingering child processes because they will inevitably run their course)

b) you want to follow the lifecycle but for some reason don’t want to enable trapping exits on the parent

Given your situation, the conceptual approach you took with linking seems the most reasonable, the problem seems to be that you’re linking the shell process to the gen_server (and probably not trapping exits in the shell itself) and then after linking issuing an exit to that gen_server (which will propagate to the shell due to the link), I wonder if you have a proper handle_info for the {:EXIT, pid, reason} message that it will receive when trapping exits?

My approach would be as the MyGenserver module receives exit messages from the child processes to store them in its own state, with a proper handle for each exit case you want to distinguish. You need to keep track of all started processes as well, so that in the end you distinguish those that haven’t exited/or force them to exit from those that did during the running cycle.

So basically:

State Data Form: %{running: %{}, finished: %{}}

handles:

init, handle_info for the exit messages, a handle/logic to decide when it's finished?

Step1: init -> trap_exits,

Step2: start child processes -> store the pid in a map in the state of the parent MyGenserver, in the running map, it can be for instance in the form of pid -> identifier key-value

Step3: wait for exit messages -> They come as {:EXIT, pid, reason}, As they exit use the pid to remove the pid from the running map (with Map.pop/2 you get access to the value of the popped key), and store their exit reason with that popped identifier in the finished map

Step4: When finished -> You now have a map of all early exits in finished, and you can for instance
a) iterate the remaining keys in the running map to add them as normal (meaning the stayed alive until the end) to the finished map.
b) iterate the remaining keys in the running map and send an exit signal with :normal as the reason to each pid still alive, which in turn will make them exit normally, further delivering the regular exit message (due to trapping), which will the follow the same logic, adding it to the finished map.

You can now write this to a file, to a database, or whatever you want and then exit the MyGenserver (forcing the still alive child processes to exit due to the linking in case you didn’t exit them prior).

An example would be (with the caveat that here all processes exit so step4 isn’t shown):

defmodule Game.Server do
  use GenServer
  require Logger

  defstruct [running: %{}, finished: %{}]

  def start_link() do
    # naming it the name of the module only allows on game server
    # probably not what you want, but just for the example
    GenServer.start_link(__MODULE__, %__MODULE__{}, [name: __MODULE__])
  end

  def init(state) do
    Process.flag(:trap_exit, true)
    {:ok, state}
  end

  def start_players(list_of_players) do
    Enum.each(list_of_players, fn(identifier) -> start_player(identifier) end)
  end

  def start_player(identifier) do
    GenServer.call(__MODULE__, {:start_player, identifier})
  end

  def handle_call({:start_player, identifier}, _from, %{running: running} = state) do
    spawned_pid = Process.spawn(fn -> 
      Process.sleep(5000)
      case Enum.random([true, false]) do
        true -> Process.exit(self(), :normal)
        false -> Process.exit(self(), :disqualified)
      end
    end, [:link])
    
    n_running = Map.put(running, spawned_pid, identifier)
    {:reply, :ok, %{state | running: n_running}}
  end

  def handle_info({:EXIT, pid, reason}, %{running: running, finished: finished} = state) when :erlang.is_map_key(pid, running) do
    {identifier, n_running} = Map.pop(running, pid)
    n_finished = Map.put(finished, identifier, {reason, DateTime.utc_now()})
    {:noreply, %{state | running: n_running, finished: n_finished}, {:continue, :maybe_finished}}
  end

  def handle_info({:EXIT, pid, reason}, state) do
    Logger.warn("#{inspect pid} exited with reason #{reason} but wasn't in the running map")
    {:noreply, state}
  end

  def handle_continue(:maybe_finished, %{running: running, finished: finished} = state) when running == %{} do
    Logger.info("Finished all player processes:\n #{inspect finished}")
    {:stop, :normal, state}
  end

  def handle_continue(:maybe_finished, state), do: {:noreply, state}
end
iex(5)> {:ok, pid} = Game.Server.start_link()      
{:ok, #PID<0.1316.0>}                        
iex(6)> Game.Server.start_players([:a, :b, :c, :d])
:ok                                          
iex(7)> [info] Finished all player processes:
 %{a: {:disqualified, ~U[2020-04-08 08:40:06.033692Z]}, b: {:normal, ~U[2020-04-08 08:40:06.033699Z]}, c: {:disqualified, ~U[2020-04-08 08:40:06.033663Z]}, d: {:disqualified, ~U[2020-04-08 08:40:06.033703Z]}}
1 Like

Thank you – these responses have been very educational. I’m thinking out loud a little more here… I’ve noticed that I have to think about structure differently when data lives in a process than when I think of data living with a resource (i.e. a model/database).

I think this will work just fine if the parent “game” process stores the state – the child processes (i.e. the “players”) can simply return data when their functions are called (all “player” modules implement the same behaviour/callbacks). Although I could send a message back to the game’s PID, I don’t see any benefit in doing that. The parent game process just passes around a large struct of game_data, e.g. something like:

game_data = %BigStruct{...}

player1 = Player.start_link(Player1Module)
player2 = Player.start_link(Player2Module)
# ... etc ...

# Players react to the state of the game:
player1_response = Player1Module.react(game_data)
game_data = Map.put(game_data, :p1_response, player1_response)

The only question of form I have is whether it’s better to pass around a large struct of game data, or to pass around the game’s PID? I don’t know if there’s any performance penalty one way or another, but passing the game’s PID around seems more elegant than passing around the entire struct of the game’s state.

In a turn based game I developed I just use a gen_server for the game in question (several can be running at the same time, but each one pertaining a single game only), and this gen_server uses a struct with all relevant pieces of data, “actions” taken by the players are simply processed by that gen_server, where it uses the game “state” and applies whatever commands correspond to the action.

In that case, since it’s a turn based model, you don’t really need extra processes for the game logic (I do spawn processes for persisting the state after each action into pg), as a backup, but the transformations to the state themselves are processed by the gen_server alone (since it blocking is irrelevant due to being turnbased, it’s actually good).

If on the other hand the game was concurrent, where each player can take actions at their discretion not following a defined set of priorities, then it could make sense to have a main gen_server holding the game, then child processes and make them communicate - since that allows the flow to be asynchronous: the main gen_server/statem wtv has 3 concerns only, is the truth source regarding the state, routes requests for actions to the appropriate “child”, and then receives updates/commands from these requests which it would use to update the state accordingly, broadcast back to the relevant players, etc.

But it all depends on the type of game and how it’s played - it’s also more complex to orchestrate having actions being dispatched to the child and have the updates coming from somewhere external to the truth source. Perhaps it would make sense to use mnesia, with a table/record per game, as the source of truth that the child processes could interact with, I say mnesia and not ets because mnesia supports transactions and if the child processes would be interacting simultaneously with the underlying source of truth, then transactions are good and if it’s all in the same server then mnesia is almost the same as ets in terms of speed.

If everything is synchronous in terms of processing, a data structure is probably the right way as long as it’s processed on the same process - if you send it around it incurs copying from process to process (which might be relevant, or not). If on the other hand you need to use several processes to model a concurrent gameplay then storing the pids , using a main process just as a dispatcher and ets (with some process acting as a serialization point) or mnesia (no need for another manually managed process) as the source of truth and then have the child processes interact with that directly. With mnesia, as long as you use transactions and always fetch the value/s you’re changing on the beginning of the transaction and operate on those, you are set.

But again this is just spouting options - it really depends on the gameplay you would be trying to model.