Stale/Deadlock Process

Assume there is a function like:

def sample do
  lab_pid = self()

  spawn_link(fn ->
    spawn_link(fn ->
      Process.flag(:trap_exit, true)

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end)

    :timer.sleep(100)
  end)

  receive do
    msg ->
      msg
  end
end

This works. But if we remove the sleep, it becomes stale/deadlock. Why?

1 Like

I wouldn’t call it a deadlock, but rather a race condition. Too see why it happens, let’s first name these processes:

def sample do
  # process A
  lab_pid = self()

  spawn_link(fn ->
    # process B

    spawn_link(fn ->
      # process C

      Process.flag(:trap_exit, true)

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end)
  end)

  receive do
    msg ->
      msg
  end
end

So without a timeout, it’s possible that process B stops before Process.flag in process C has been invoked. This happens because spawn and spawn_link are asynchronous. When these functions return, the process has been created, but it’s possible that they still haven’t executed any instruction.

So in this scenario, if process C starts trapping exits after process B had stopped, the corresponding exit signal won’t be converted into a message. So now, you have process C waiting forever for a message which will never arrive, and consequently, process A is also waiting forever for a message which process C will never send.

The solution would be to use synchronous start, e.g. with :proc_lib:

def sample do
  lab_pid = self()

  spawn_link(fn ->
    :proc_lib.start_link(Kernel, :apply, [fn ->
      Process.flag(:trap_exit, true)

      # `:proc_lib.start_link` will return after this function is invoked
      :proc_lib.init_ack({:ok, self()})

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end, []])
  end)

  receive do
    msg ->
      msg
  end
end
7 Likes

You are right! A simpler way (for me) is informing the parent that the child is started/ready:

def sample do
  lab_pid = self()

  spawn_link(fn ->
    parent_pid = self()

    spawn_link(fn ->
      Process.flag(:trap_exit, true)
      send(parent_pid, :started)

      receive do
        {:EXIT, _pid, :normal} ->
          send(lab_pid, :done)
      end
    end)

    # :timer.sleep(100)
    receive do
      _ -> 0
    end
  end)

  receive do
    msg -> msg
  end
end

This is what :proc_lib.start_link and :proc_lib.init_ack do under the hood too :slight_smile: See here and here.

2 Likes

This is also a good example as to why monitors are often better then links+traps. :slight_smile:

1 Like

Since here the child is trapping the exit of parent (to do some cleaups for example) wouldn’t the situation be the same? For example the parent process can exit before the Process.monitor(parent_pid) expression executes inside the child process.

That’s why a parent supervisor of both should ensure that the monitoring process always outlives the children.

In this case you would still receive a :DOWN message with an exit reason set to :noproc, so monitor would in fact work without additional synchronism. That said, if we’re talking about parent-child relationship, the usual approach in OTP is to use the exit signals to do the cleanup in child if the parent terminates.

Oh yes, that, I was reading it as the other way (that the parent would die when it’s the parent calling monitor on the child)! Yes, that is why monitor’s should be used, they are always ‘safe’, you will always receive a message of :DOWN, even if it went down before the monitor was started. Monitors are much better than links for watching process lifetimes, links should only be used to actually link and die.

1 Like