Stale/Deadlock Process

dc0d · November 11, 2018, 7:56pm

Assume there is a function like:

def sample do
  lab_pid = self()

  spawn_link(fn ->
    spawn_link(fn ->
      Process.flag(:trap_exit, true)

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end)

    :timer.sleep(100)
  end)

  receive do
    msg ->
      msg
  end
end

This works. But if we remove the sleep, it becomes stale/deadlock. Why?

sasajuric · November 11, 2018, 8:08pm

I wouldn’t call it a deadlock, but rather a race condition. Too see why it happens, let’s first name these processes:

def sample do
  # process A
  lab_pid = self()

  spawn_link(fn ->
    # process B

    spawn_link(fn ->
      # process C

      Process.flag(:trap_exit, true)

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end)
  end)

  receive do
    msg ->
      msg
  end
end

So without a timeout, it’s possible that process B stops before Process.flag in process C has been invoked. This happens because spawn and spawn_link are asynchronous. When these functions return, the process has been created, but it’s possible that they still haven’t executed any instruction.

So in this scenario, if process C starts trapping exits after process B had stopped, the corresponding exit signal won’t be converted into a message. So now, you have process C waiting forever for a message which will never arrive, and consequently, process A is also waiting forever for a message which process C will never send.

The solution would be to use synchronous start, e.g. with :proc_lib:

def sample do
  lab_pid = self()

  spawn_link(fn ->
    :proc_lib.start_link(Kernel, :apply, [fn ->
      Process.flag(:trap_exit, true)

      # `:proc_lib.start_link` will return after this function is invoked
      :proc_lib.init_ack({:ok, self()})

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end, []])
  end)

  receive do
    msg ->
      msg
  end
end

dc0d · November 11, 2018, 8:18pm

You are right! A simpler way (for me) is informing the parent that the child is started/ready:

def sample do
  lab_pid = self()

  spawn_link(fn ->
    parent_pid = self()

    spawn_link(fn ->
      Process.flag(:trap_exit, true)
      send(parent_pid, :started)

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end)

    # :timer.sleep(100)
    receive do
      _ -> 0
    end
  end)

  receive do
    msg -> msg
  end
end

sasajuric · November 11, 2018, 8:57pm

This is what :proc_lib.start_link and :proc_lib.init_ack do under the hood too See here and here.

OvermindDL1 · November 13, 2018, 9:25pm

This is also a good example as to why monitors are often better then links+traps.

dc0d · November 14, 2018, 5:48pm

Since here the child is trapping the exit of parent (to do some cleaups for example) wouldn’t the situation be the same? For example the parent process can exit before the Process.monitor(parent_pid) expression executes inside the child process.

OvermindDL1 · November 14, 2018, 6:35pm

That’s why a parent supervisor of both should ensure that the monitoring process always outlives the children.

sasajuric · November 14, 2018, 6:35pm

In this case you would still receive a :DOWN message with an exit reason set to :noproc, so monitor would in fact work without additional synchronism. That said, if we’re talking about parent-child relationship, the usual approach in OTP is to use the exit signals to do the cleanup in child if the parent terminates.

OvermindDL1 · November 14, 2018, 6:54pm

Oh yes, that, I was reading it as the other way (that the parent would die when it’s the parent calling monitor on the child)! Yes, that is why monitor’s should be used, they are always ‘safe’, you will always receive a message of :DOWN, even if it went down before the monitor was started. Monitors are much better than links for watching process lifetimes, links should only be used to actually link and die.