Introduction
Introducing RaRegistry: A Distributed, Raft-Backed Process Registry for Elixir
If you’re building distributed Elixir applications and need a reliable, consistent process registry, RaRegistry is a new library worth exploring. It offers a drop-in alternative to Elixir’s built-in Registry
, but with distributed consensus powered by Ra, RabbitMQ’s implementation of the Raft protocol.
Features:
- Support for both
:unique
and:duplicate
registration modes - Automatic process monitoring and cleanup
- Built on Ra, RabbitMQ’s implementation of the Raft consensus protocol
- Regular operations with strong consistency during normal cluster operation
- Familiar API similar to Elixir’s built-in Registry
- Enhanced recovery mechanisms for handling abrupt node down scenarios like SIGKILL
- Seamless integration with GenServer via the
:via
tuple registration
Usage
Dependency
def deps do
[
{:ra_registry, "~> 0.1.2"}
]
end
Example implementation
defmodule MyApp do
# Add RaRegistry to your application supervision tree
def start(_type, _args) do
children = [
# Start RaRegistry after any kind of node discovery mechanism such as libcluster
# You can configure any configuration related with the :ra cluster under ra_config.
# wait for nodes range ms is a random range between two milliseconds values to ensure nodes are properly connected
{
RaRegistry,
keys: :unique, # or :duplicate
name: MyApp.Registry,
ra_config: %{data_dir: ~c"/tmp/ra"},
wait_for_nodes_range_ms: 3000..5000
},
# Other children in your supervision tree...
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
defmodule MyApp.Server do
use GenServer
def start_link(opts) do
GenServer.start_link(__MODULE__, [], name: {:via, RaRegistry, {MyApp.Registry, opts[:id]}})
end
def call(id, message) do
GenServer.call(via_tuple(id), message)
end
defp via_tuple(id), do: {:via, RaRegistry, {MyApp.Registry, id}}
# GenServer implementation
def init(state), do: {:ok, state}
def handle_call(:ping, _from, state), do: {:reply, :pong, state}
def handle_call({:get, key}, _from, state), do: {:reply, Map.get(state, key), state}
def handle_call({:set, key, value}, _from, state), do: {:reply, :ok, Map.put(state, key, value)}
end
# Then, in your application code:
{:ok, pid} = MyApp.Server.start_link(id: "user_123")
# This call will work from any node in the cluster
MyApp.Server.call("user_123", {:set, :name, "John"})
MyApp.Server.call("user_123", {:get, :name}) # => "John"
# Should return already started regardless of the node you try to start the Server
{:error, {:already_started, ^pid}} = MyApp.Server.start_link(id: "user_123")
Consistency and Recovery
Consistency Model
RaRegistry offers these consistency guarantees:
- Normal Operation: Operations use the Raft consensus protocol via Ra, providing strong consistency when a majority of nodes are available
- State Machine Atomicity: Operations within the Ra state machine are atomic and either fully succeed or have no effect
- Best-Effort Recovery: During failure scenarios like SIGKILL of the leader, our implementation employs aggressive recovery mechanisms that prioritize cluster recovery
It’s important to understand that:
- The custom recovery mechanisms we’ve implemented extend beyond the standard Raft protocol
- After recovery, the system returns to a consistent state, though some in-flight operations might result in errors due to incomplete execution
Recovery Capabilities
RaRegistry includes specialized recovery mechanisms to handle various failure scenarios:
- Automatic leader election after clean node failures
- Emergency recovery procedures for SIGKILL scenarios
- Self-healing mechanisms when nodes rejoin the cluster
- Cleanup of dead process registrations