Confused with GenServer/OTP timing

future-cyborg · August 30, 2020, 7:26pm

Hey all, I’m writing some tests around a GenServer module and I’ve discovered some behavior that seems strange to me. It definitely has pointed out that I don’t understand what is actually happening. The code is similar to what is in getting-started/genserver, except the monitored processes aren’t Agents.

Players can join, and will be removed if their connection is closed. Here are the callbacks.

  @impl true
  def handle_call({:join, name}, from, {names, refs, room_code}) do
    {join_proc, _} = from

    if Map.has_key?(names, name) do
      {:reply, {:error, :name_taken}, {names, refs, room_code}}
    else
      ref = Process.monitor(join_proc)
      refs = Map.put(refs, ref, name)
      names = Map.put(names, name, join_proc)
      {:reply, :ok, {names, refs, room_code}}
    end
  end

  @impl true
  def handle_call(:get_state, _from, state) do
    {:reply, state, state}
  end

  @impl true
  def handle_info({:DOWN, ref, :process, _pid, _reason}, {names, refs, room_code}) do
    {name, refs} = Map.pop(refs, ref)
    names = Map.delete(names, name)
    {:noreply, {names, refs, room_code}}
  end

And the test

  test "removes player on exit", %{lobby: lobby} do
    pid = spawn fn -> PassIt.Game.Lobby.join(lobby, "bob") end

    {names, _, _} = PassIt.Game.Lobby.get_state(lobby)
    assert names == %{}
    # Process.alive?(pid)  # This somehow triggers the process ending?
    # IO.inspect(names)

    assert :ok = PassIt.Game.Lobby.join(lobby, "bob")  #This assert fails
  end

The last assert fails, but if I uncomment either of those two lines it doesn’t fail. I’m really confused how the first assert will be successful and the second will fail. If the names map is empty, then the process has already been removed and the call to join() shouldn’t return {:error, :name_taken}, but it is.

What is happening with the timing of this code that is causing this? I appreciate the help understanding this!

joey_the_snake · August 30, 2020, 7:55pm

Mayhaps my reasoning is nonsense, but I’m thinking this is happening when you comment the things out:

Process is spawned
The call to get_state in the main process happens before the spawned process can call join.
- I believe when spawn returns, you’re only guaranteed that the process has started and not that its function call has completed, but I could be wrong.
The call to join in the spawned process happens
The call to join in the main process happens before the spawned process can die and send its :down message

Now when you don’t comment out those things, there is a slight delay between you getting the state and trying to join “bob” again. This gives the spawned process enough time to send its :down message to the genserver

future-cyborg · August 30, 2020, 10:40pm

Great feedback, thanks! I put some counters in the code so I could tell when each callback was happening. You were correct in that the join() inside the spawned process is actually happening before the get_state() call. What I really wanted to use here is Task.async() and Task.wait(). (I assume there are other ways as well to achieve this.)

I changed the test to this, and got the expected results.

  test "removes player on exit", %{lobby: lobby} do
    pid = Task.async(fn -> PassIt.Game.Lobby.join(lobby, "bob") end)
    Task.await(pid)

    {names, _, _} = PassIt.Game.Lobby.get_state(lobby)
    assert names == %{}

    assert :ok = PassIt.Game.Lobby.join(lobby, "bob")
  end