Is it possible to stop, but not kill, a process?

tiny_sal · November 4, 2022, 11:45am

Hey all!

Is it possible to stop but not kill processes in Elixir?

I’ve got a data structure that has a complex web of connected processes. I’m trying to stop the processes executing their mailbox queue, but keep them in memory so I can start them again with new data - possible?

Another way to think of this is that I’m trying to clear a process mailbox. I don’t want the process to shut down, just to stop working and be ready to receive new work instructions:

Process A is doing work and passing the results on to process B, which is doing work and passing the results back to process A - repeat.

This is working for my needs. I now just need some way to interrupt this back and forth so I can inspect the state of A and B.

I’ve tried implementing a custom stop function on A and B but the problem is, due to the recursive nature of what I’m doing, their mailboxes are too backed up for the stop call to run.

odix67 · November 4, 2022, 1:01pm

I’m not sure, if I understand you correctly, but I would implemented such a “thing” using GenServers. What wired for me is, that you want to empty the mailboxes of processes

soup · November 4, 2022, 1:28pm

Possibly you want GenStage GenStage — gen_stage v1.1.2, which can have back pressure guessing your description. Never done circular dependencies with it though - but do you really need them to be circular?

al2o3cr · November 4, 2022, 1:34pm

A process that isn’t reading its mailbox is effectively dead, because that’s the only way other processes can interact with it or inspect its state.

One very generic comment on the architecture you’ve described: right now, each process does a particular thing and they hand work items back and forth. What about inverting that so a work item stayed within a single process that did multiple steps?

groovyda · November 4, 2022, 1:41pm

Would this not work? erlang:suspend_process

although it does say this BIF is for debugging only, so I’m guessing it’s not good practice for the reasons mentioned in this thread…

dimitarvp · November 4, 2022, 1:50pm

You might want Process.hibernate but I am with the others who say you should rethink your code.

derpycoder · November 4, 2022, 2:06pm

Does process A have to be a single process?

Can’t you drain the backlog mailbox, by spawning another process A’ and routing all the results that come from B to A’.

Once the process A is drained, you can inspect it. Then flip the switch like a Blue Green deployment, so A starts receiving results from B again.

You can do the same for B side, by switching on a process B’ and offloading B mailbox.

Inspired by Blue Green deployment, Load Balancers, & Cockroach DB draining it’s node before shutting down, Hot Stand, High Availability using KeepAlived.

P.S. This is just a speculation, I don’t know how GenStage can be modified for your situation.

rvirding · November 4, 2022, 2:59pm

Note that Process.hibernate does NOT suspend the process in any way, it just compacts it to special format to save memory. When the process receives a message then it is unpacked to the normal process memory format and it just keeps running.

The reason for adding the hibernate was to save memory of processes which only run very seldom but still have to be there and alive. It is a memory <-> CPU tradeoff. This was back in th old days when we didin’t have that much memory.

:erlang.suspend_process is the way to go and then :erlang.resume_process to get them going again. Be aware that suspending processes can cause other process to run into timeouts which can cause problems.

lud · November 4, 2022, 11:09pm

Ok so what you can do is:

each process has a queue in their state
whenever you send work to them, for instance in handle_call/3, they do not do the work but just add it to their respective queue
then they return 0 as a timeout from handle_call
when they receive the :timeout in hande_info/2, they pull one task from their queue if it is not empty, and handle that task, and then return with a zero timeout from there too.
in their state, they also keep a flag to tell if they should work or not.

As soon as you will send them a :stop message, they set their flag to false and ignore the timeout message, and return :infinity as a timeout.

I put up a quick and dirty demo:

defmodule Serv do
  use GenServer

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: opts[:name])
  end

  def init(opts) do
    {:ok, %{queue: :queue.new(), enabled: true, coworker: opts[:coworker], me: opts[:name]}}
  end

  def handle_call({:task, val}, _, state) do
    {:reply, :ok, %{state | queue: :queue.in(val, state.queue)}, timeout(state)}
  end

  def handle_call(:pause, _, state) do
    {:reply, :ok, %{state | enabled: false}, :infinity}
  end

  def handle_call(:resume, _, state) do
    {:reply, :ok, %{state | enabled: true}, 0}
  end

  def handle_info(:timeout, state) do
    state =
      case state.enabled do
        true -> do_work(state)
        false -> state
      end

    {:noreply, state, timeout(state)}
  end

  def handle_info({:your_turn, val}, state) do
    {:noreply, %{state | queue: :queue.in(val, state.queue)}, timeout(state)}
  end

  def do_work(%{queue: q, coworker: buddy} = state) do
    case :queue.out(q) do
      {:empty, _} ->
        state

      {{:value, val}, new_q} ->
        val = work(val)
        IO.puts("result from #{inspect(state.me)}: #{inspect(val)}")
        send(buddy, {:your_turn, val})
        %{state | queue: new_q}
    end
  end

  def work(val) do
    Process.sleep(100)
    val + 1
  end

  defp timeout(%{enabled: enabled}) do
    case enabled do
      true -> 0
      false -> :infinity
    end
  end
end

{:ok, a} = Serv.start_link(name: A, coworker: B)
{:ok, b} = Serv.start_link(name: B, coworker: A)

GenServer.call(a, {:task, 0})
GenServer.call(b, {:task, 1000})

Process.sleep(1000)

IO.puts("pause A")
GenServer.call(a, :pause)

Process.sleep(1000)

IO.puts("resume A")
GenServer.call(a, :resume)

Process.sleep(1000)

edit: the timeout/1 function should check if the queue is empty. If it is empty it should return :infinity even if enabled is true. Otherwise the process will loop again and again on a timeout for nothing.