DGen - A distributed GenServer
I love GenServer. There are only 2 things stopping me from writing an entire app with them:
- Durability: The state is lost when the process goes down.
- High availability: The functionality is unavailable when the process goes down.
What if we could guarantee the GenServer never went down? Could we build a stateful application without a database?
Many Erlang and Elixir developers have this fantasy at some point in their journey. But how close can we actually get to the dream? I’d like to find out with DGen. This is v0.1.0 stuff, early days.
What DGen does
DGen provides a “distributed GenServer” (DGenServer). It’s meant to work just like a GenServer, but the message queue and the module state are durably stored in FoundationDB, with other backends possible.
Quick example
Our simplest example looks almost exactly like a GenServer.
defmodule Counter do
use DGenServer
def start(tenant), do: DGenServer.start(__MODULE__, [], tenant: tenant)
def increment(pid), do: DGenServer.cast(pid, :increment)
def value(pid), do: DGenServer.call(pid, :value)
@impl true
def init([]), do: {:ok, 0}
@impl true
def handle_call(:value, _from, state), do: {:reply, state, state}
@impl true
def handle_cast(:increment, state), do: {:noreply, state + 1}
end
However, the state lives on beyond the lifetime of the original Elixir process.
{:ok, pid} = Counter.start(tenant)
Counter.increment(pid)
Counter.increment(pid)
2 = Counter.value(pid)
# Restart the process
Process.exit(pid, :kill)
{:ok, pid2} = Counter.start(tenant)
2 = Counter.value(pid2) # State persisted!
The tenant argument is the only unique piece here. This tells DGenServer where to persist the queue and state in the datastore.
Beyond the basics
The simple example demonstrates the durable state. But the benefits inherited by a serializable distributed system are all here:
- Start one DGenServer per node, with state mutations processed exactly once, without rpc coordination or process registration.
- Separate where messages are pushed from where they are processed. Producers can run anywhere in the cluster, while consumers — the processes that mutate state — can be pinned to specific nodes or hardware.
- Perform side effects such as sending emails or performing network requests, with similar transactional guarantees.
Embracing side effects
The simplest side effect is one that happens after a state mutation. For example, a log message is a side effect! Your callback can optionally return a function to be executed after the state change is committed.
def handle_cast(:increment, state) do
action = &Logger.info("Counter is now #{&1}") # runs after commit
{:noreply, state + 1, [action]}
end
On the other hand, when a side effect needs to update the state, then we must lock out the queue from processing messages while our side effect executes outside of the transaction.
def handle_cast(:send_email, state) do
# executes inside the transaction
{:lock, state}
end
def handle_locked(:cast, :send_email, state) do
# executes outside of a transaction
Req.post(...)
{:noreply, %{state | sent: state.sent + 1}}
end
Under the hood - performance characteristics
Message Queue: The critical piece of DGenServer is the message queue. A caller must be able to push new messages onto the queue with serializability guarantees and high concurrency. This is achieved by using versionstamped keys, which exactly tie the underlying commit order with the key order.
Writes: The module state could be stored as a single term_to_binary blob. However, doing so would amplify the number of writes necessary for incremental changes. Instead, DGenServer adopts the design decisions of LiveView’s assigns and component lists. A module state consisting of a map with atom-keys or a list with string-id’d elements are optimized for incremental diffs on write. This means that standard Elixir structs are the preferred terms for the DGenServer module state.
Reads: And finally, DGenServer will cache the module state in memory to improve performance, with perfect cache invalidation. A single hot consumer will never have to read the full state, unless it restarts.
These components together allow for adequate performance. Still, you shouldn’t replace all your GenServers tomorrow. DGenServer should be reserved for stateful mutations that require durability and high availability guarantees, such that the performance tradeoff is acceptable.
Let it crash?
A DGenServer consumer can crash, just like any other Elixir process. But a key difference here is that the message queue is durable. If the crash is due to a poison message, a supervisor restart of a DGenServer consumer will simply try to process the same message again. DGen has yet to learn what this means in a production setting. Crash semantics themselves are well-defined, but system recovery is not automatic, like it is with GenServer. An operator may have to manually delete a poison message from the queue - an operation that is not possible with a standard GenServer.
Other backends?
DGen requires a strictly serializable key-value datastore with transactions, like FoundationDB, to provide the consistency guarantees to match the semantics of a GenServer. The first backend implementation available in DGen is FoundationDB, via erlfdb. But, we hope that other datastores providing a similar featureset can be implemented as alternative backends (such as Hobbes, Bedrock, etc.). I’m very open to flexing the current backend behaviour (:dgen_backend) to support other projects.
Links
- Hex
- HexDocs
- :dgen_server docs
- GitHub:
Community input
So I’m continuing my obsession with exploring different kinds of state engines on the BEAM. I know there are like-minded folks around, so I’d love to hear thoughts and feedback about the approach. This project isn’t meant to be integrated into your production app today, but hoping it can evolve into something useful.






















