Avoiding race conditions with GenServer.call/2

wmnnd · March 26, 2017, 11:44am

Hi there,

when trying to use GenServer.call/2 with a pid that doesn’t exist (anymore), the calling process gets terminated.

Of course, one could use GenServer.whereis/1 to check if the GenServer is currently running but that would still introduce a potential race condition in which the GenServer is terminated from another process between the calls to GenServer.whereis/1 and GenServer.call/2.

Here is a little module to illustrate the problem:

defmodule Foo do
  use GenServer

  def start_link do
    GenServer.start_link(__MODULE__, :bar, name: __MODULE__)
  end

  def shutdown(delay) do
    Task.async(fn ->
      :timer.sleep(delay)
      GenServer.stop(__MODULE__)
    end)
  end

  def query(delay) do
    Task.async(fn ->
      :timer.sleep(delay)
      GenServer.whereis(__MODULE__)
      |> case do
        nil ->
          IO.puts "GenServer #{__MODULE__} has gone away"
        pid ->
          :timer.sleep(delay)
          GenServer.call(pid, {:query})
          |> IO.inspect
      end
    end)
  end

  def handle_call({:query}, _from, state),
    do: {:reply, state, state}
end

If you call Foo.start_link && Foo.shutdown(100) && Foo.query(10), everything will work out nicely and the state will be printed.

When you call Foo.start_link && Foo.shutdown(100) && Foo.query(150), you also get the expected behavior: a message that the GenServer has gone away will be printed.

When calling Foo.start_link && Foo.shutdown(100) && Foo.query(150), however, the Task created in Foo.query/1 crashes.

The reason for this is that GenServer.call/3 internally calls Kernel.exit/3 if GenServer.whereis/1 returns nil.

So what is the recommended way of only messaging GenServer if it’s still alive? Should I be trapping the exits for processes that try to call GenServers that have potentially been terminated?

Doesn’t this also mean that GenServer.call/3 itself is vulnerable to the same race condition because it uses the same check as the Foo module shown above?

NobbZ · March 26, 2017, 12:25pm

Try to avoid using a pid and use a name instead. Of course making the calling process crash is totally idiomatic. When the genserver isn’t running anymore you won’t get a meaningful result anyway, so that your caller might crash because of a faulty value anyway.

dom · March 26, 2017, 12:33pm

Just catch it:

try do
   GenServer.call(pid, :hi)
catch
   :exit, {:noproc, _} -> :ok_never_mind
end

Elixir’s GenServer.call does use whereis, but I think the point is converting registered names into PIDs, not ensure the process is alive. It still does a try/catch.

(Edit: actually :gen.call seems to accept names, so Elixir might be doing that just be to provide a helpful message when calling self)

cdegroot · March 27, 2017, 1:23pm

That’s probably what you want. Assuming you have a correct supervision hierarchy, both processes will be restarted and stuff will (eventually) work again. Also, on using pids to refer to other GenServers, Saša Jurić - Discovering Processes (ElixirConfEU 2016) on Vimeo is an excellent exposition of what to avoid and how to properly set things up.

wmnnd · March 27, 2017, 3:39pm

Catching it error like this seems like a good solution. Thank you!

wmnnd · March 27, 2017, 3:41pm

In my case it’s not what I want because I actually expect that the GenServer might have been stopped on purpose and then I want my process to respond accordingly. But thank you for the video suggestion, I am definitely going to watch the talk