PersistentGenServer - Persist Your GenServers!

PersistentGenServer

PersistentGenServer makes your Generic Elixir (or Erlang) Servers Persistent!

It is currently in a relatively early development stage (I’d say about 75% towards a stable release).
As such, it is not yet available on Hex.

Example

At its essence, the only thing you need to do to make your GenServer persistent, is to replace the line

GenServer.start(YourModule, list_of_arguments, maybe_some_options)

with

PersistentGenServer.start(YourModule, list_of_arguments, maybe_some_options)

What does it do?

A Persistent GenServer will behave just like a normal GenServer, with the following differences:

  • It will keep track of the state changes that are the result of every call/cast made to it, and store these in the configured persistence layer (Which implements the PersistentGenServer.Storage behaviour)
  • It will automatically stop after a specified timeout.
  • If it is called after it has been stopped, a new server with the last persisted state will be started.
  • Only in the case of a normal shutdown, will the process’ state be removed from the persistent storage. As such, many kinds of crashes will result in the process being restarted with the last state before the crash.

Why is this useful?

Many systems are more easily modeled as a bunch of (potentially communicating) state machines, rather than modelling them around a relational database.

As a simple example, consider a (long-running) game: Handling player inputs is something that is much more natural in a state machine than it is with wrapping a database. However, we do (1) want to persist what the player does, but (2) not keep a GenServer running for all players, since only a fraction of those is playing at any single given time.

After having encountered about three situations in which I was re-implementing a very similar ‘persistence’ layer for my GenServers, I decided to extract this logic, and make it more general by hiding the persistency logic as much as possible from the user of the system.

How does it work?

Internally, a special process registry (which wraps the Elixir.Registry by default, but can use any other registry you desire as well) keeps track of which processes are currently started, and which only exist as data persisted to disk.

You do not receive a PID as response from PersistentGenServer.start. Rather, you receive a symbolic PID that includes all information to start the server again at a later time. (Essentially the arguments to PersistentGenServer.start).

Configuration

PersistentGenServer is configured through Specify and as such can be configured in many different ways, and multiple differently-configured variants of the PersistentGenServer can run alongside each-other.

To Dos before stable release

  • [ ] Stateful property tests to make sure there are no race conditions.
  • [x] Swap out process registries that PersistentGenServer.Registry wraps.
  • [ ] Let users choose between: (temporary/transient/permanent)
    • Wipe persistency for GenServer when it stops normally or crashes.
    • Wipe persistency for GenServer only when it stops normally.
    • Even restart GenServer from persistency when it crashed before.
  • [x] Other storage adapters.
  • [x] Timeout length before a process petrifies itself.
  • [ ] Configurably, only write to cache on terminate vs during each handle_* for efficienty vs fault-tolerancy?
  • [x] A mapping function between the actual state and the state-to-be-persisted/reloaded, to hide ephemeral parts.
  • [ ] Improving the documentation and examples.

Installation

PersistentGenServer is not yet available in Hex, and as such can be added to your deps as follows:

{:persistent_gen_server, git: "https://github.com/Qqwy/elixir_persistent_gen_server"}

I very much look forward for feedback from the community to see in which ways you would like this to be developed further :slight_smile: .

8 Likes

First: Your repo seems to be private. Link to https://github.com/Qqwy/elixir_persistent_gen_server results in a 404.

This will be a nice lib, provided the following could be added in the future:

  1. start_link function together with the ability to start it under a Supervisor. Not sure if that is possible if one wants to keep the “stop after timeout and restart when needed” functionality in its current form.
  2. Recover state on system restart. If stopped properly, your lib seems to remove the state for all GenServers but there might be cases when someone wants to recover full state after properly restarting the OTP app (e.g. deployment).
3 Likes

Do you intend the “save” to be always automatic? I think that’s going to be problematic to a lot of people who will point out that GenServer crashes for a reason, and that reason is usually bad state. GenServer should recover from last known good state which often is blank state or a state restored from database that provides strong data integrity guarantees.

I don’t necessarily think the above myself, as I can definitely see the usage examples where the above is a good fit. But I do fear people will abuse it, and then end up with GenServers that have persisted state, crash, restart, load up bad state and crash again - for ever. Or did you think of tackling that issue somehow?

4 Likes

Thank you! This has now been rectified :slight_smile:.

I tried this out, but it would require serious changes to OTP’s signaling, because in its current form, when a PersistentGenServer quits because its timeout is reached, the Supervisor will think it has to be restarted right away.
So I do not think that links (neither the bidirectional nor the supervision kind) make a lot of sense to be used for these kinds of processes. Monitors might be useful in certain cases maybe, but most of the time you can just pretend that a PersistentGenServer exists if you have the symbolic PID to it because it will be started on demand once you call it.

Having said that, one thing that needs to be figured out in the library is how to handle PersistentGenServer PIDs that have since been shut down gracefully. :thinking:

That is a very interesting problem that I did think about yet! I definitely want to be able to make the moment at which servers remove their state configurable. I wonder if there is a way to disambiguate the shutdown when the application shuts down. :thinking:

When using it in its most basic way, I intend the save of the state to happen always at the point the GenServer’s handle_cast/call/info call is done. This already makes it highly unlikely that a bad state is persisted, unless it took a sequence of multiple steps to trigger it.

One thing I’d like to provide (besides the option for a ‘mapping’ function to make only part of the state datastructure persistent) is the option to return from a handle_cast/call/info with a return value that indicates that you do not want to persist the outcome of this call directly.

Another possibility might be to keep track of a server that crashes multiple times in a short timespan, and kill it violently, destroying the data (or maybe roll it back multiple steps of the state, which would require keeping older versions of the state around, of course) at that time.

All PersistentGenServers stay under a DynamicSupervisor. It is the PersistentGenServer.GlobalSupervisor by default, but you can configure another supervisor for any PersistentGenServers you wish.
If a (Persistent)GenServer rigorously misbehaves and restart-crashes multiple times in quick succession, then this supervisor will itself crash. This does not currently have the effect of resetting the state of the PersistentGenServer(s) (maybe this is something that we could optionally provide?), but it does mean that the crash of such bad behaviour will be propagated up the supervision tree.

2 Likes

Work on this slowly but surely progresses…

Today I have added the possibility to specify a transformation_function that you can use to alter the state that your GenServer has before it is persisted. This is for instance useful if parts of the state actually ought to be ephemeral.


One of the things I am currently conflicted by a little is configuration: Currently the configuration of the GenServer is part of its identity. It is set when the GenServer is started, and is part of the symbolic PID that is passed around. That is probably suboptimal, if you want to change how an already-existing persistent GenServer behaves.

However, since some of these settings alter the way the PID lookup happens (and control things like ‘how-to-restart-if-persisted’), I see no other way to do this. :thinking:

More (hammock-driven :grin: ) thinking needed…

3 Likes