Peeper - a library to keep state between crashes and move processes across supervisors

Peeper is a library, that allows to preserve state between process crashes (including ETS and process dictionary,) as well as moving processes across dynamic supervisors (including a distributed transfer across nodes.)

It keeps both process dictionary and private ETS private. The caveat is all the incoming async messages during a process transfer are to be lost and the huge ETS would have been copied across nodes, which might be not an option.

Excerpt from tests:

    {:ok, source_pid} = DynamicSupervisor.start_link(name: SDS)
    {:ok, pid} = DynamicSupervisor.start_child(SDS, {MyProcess, state: 0, name: P})

    assert 0 == Peeper.call(P, :state)
    # creates a private ETS with ‹CONTENT›
    assert :ok == Peeper.cast(pid, {:create_ets, :my_ets})
    # amends process dictionary
    assert :ok == Peeper.cast(pid, {:set_pd, :foo, 42})
    # amends process state
    assert :ok == Peeper.cast(pid, :inc)
    assert 1 == Peeper.call(P3, :state)

    Process.exit(Peeper.Supervisor.worker(pid), :kill)

    assert 1 == Peeper.call(P, :state)
    assert 42 == Peeper.call(P, {:get_pd, :foo})
    assert ‹ETS CONTENT› = Peeper.call(P, {:ets, :my_ets})

    {:ok, target_pid} = DynamicSupervisor.start_link(name: TDS)

    Peeper.transfer(P3, source_pid, target_pid)

    assert [{:undefined, pid, :supervisor, _}] = DynamicSupervisor.which_children(TDS)
    assert pid == GenServer.whereis(P)
    assert 1 == Peeper.call(P, :state)
    assert 42 == Peeper.call(P, {:get_pd, :foo})
    assert ‹ETS CONTENT› = Peeper.call(P, {:ets, :my_ets})
10 Likes

Have you tried this? That way you will also keep all links, pdict, pending messages, monitors and all other pid-related state, like Registry entries (if any), keeping as much state as possible

def terminate(reason, state) do
  # Check the reason first to ensure that you want to "restart" actually

  # hibernate just to reset the stack
  :proc_lib.hibernate(:gen_server, :enter_loop, [__MODULE__, [], state])
end

terminate/2 callback is not guaranteed to be called, unfortunately.

1 Like

It is called when exception is raised. It is not called only when supervisor terminates it with kill or some other process kills the GenServer (or a memory limit is met). Usually, when terminate is not called, process is not supposed to be restarted

Plus, these two approaches can be combined

I definitely recall some issues when terminate/2 has not been called (maybe the dynamic supervision of a process spawned by unreleated process? throw/1? — that I don’t recall, unfortunately.)

But now I spent an hour trying to make it not calling terminate/2 and failed. That being said, I’ll try the approach with :proc_lib.hibernate/3, thank you very much for pointing it out!