Ra_registry — A Distributed Registry based on Raft using rabbitmq/ra

Introduction

Introducing RaRegistry: A Distributed, Raft-Backed Process Registry for Elixir

If you’re building distributed Elixir applications and need a reliable, consistent process registry, RaRegistry is a new library worth exploring. It offers a drop-in alternative to Elixir’s built-in Registry, but with distributed consensus powered by Ra, RabbitMQ’s implementation of the Raft protocol.​

Features:

  • Support for both :unique and :duplicate registration modes
  • Automatic process monitoring and cleanup
  • Built on Ra, RabbitMQ’s implementation of the Raft consensus protocol
  • Regular operations with strong consistency during normal cluster operation
  • Familiar API similar to Elixir’s built-in Registry
  • Enhanced recovery mechanisms for handling abrupt node down scenarios like SIGKILL
  • Seamless integration with GenServer via the :via tuple registration

Usage

Dependency

def deps do
  [
    {:ra_registry, "~> 0.1.2"}
  ]
end

Example implementation

defmodule MyApp do
  # Add RaRegistry to your application supervision tree
  def start(_type, _args) do
    children = [
      # Start RaRegistry after any kind of node discovery mechanism such as libcluster
      # You can configure any configuration related with the :ra cluster under ra_config.
      # wait for nodes range ms is a random range between two milliseconds values to ensure nodes are properly connected
      {
        RaRegistry,
        keys: :unique, # or :duplicate
        name: MyApp.Registry,
        ra_config: %{data_dir: ~c"/tmp/ra"},
        wait_for_nodes_range_ms: 3000..5000
      },
      
      # Other children in your supervision tree...
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

defmodule MyApp.Server do
  use GenServer
  
  def start_link(opts) do
    GenServer.start_link(__MODULE__, [], name: {:via, RaRegistry, {MyApp.Registry, opts[:id]}})
  end
  
  def call(id, message) do
    GenServer.call(via_tuple(id), message)
  end

  defp via_tuple(id), do: {:via, RaRegistry, {MyApp.Registry, id}}
  
  # GenServer implementation
  def init(state), do: {:ok, state}
  def handle_call(:ping, _from, state), do: {:reply, :pong, state}
  def handle_call({:get, key}, _from, state), do: {:reply, Map.get(state, key), state}
  def handle_call({:set, key, value}, _from, state), do: {:reply, :ok, Map.put(state, key, value)}
end

# Then, in your application code:
{:ok, pid} = MyApp.Server.start_link(id: "user_123")

# This call will work from any node in the cluster
MyApp.Server.call("user_123", {:set, :name, "John"})
MyApp.Server.call("user_123", {:get, :name}) # => "John"

# Should return already started regardless of the node you try to start the Server
{:error, {:already_started, ^pid}} = MyApp.Server.start_link(id: "user_123")

Consistency and Recovery

Consistency Model

RaRegistry offers these consistency guarantees:

  • Normal Operation: Operations use the Raft consensus protocol via Ra, providing strong consistency when a majority of nodes are available
  • State Machine Atomicity: Operations within the Ra state machine are atomic and either fully succeed or have no effect
  • Best-Effort Recovery: During failure scenarios like SIGKILL of the leader, our implementation employs aggressive recovery mechanisms that prioritize cluster recovery

It’s important to understand that:

  • The custom recovery mechanisms we’ve implemented extend beyond the standard Raft protocol
  • After recovery, the system returns to a consistent state, though some in-flight operations might result in errors due to incomplete execution

Recovery Capabilities

RaRegistry includes specialized recovery mechanisms to handle various failure scenarios:

  • Automatic leader election after clean node failures
  • Emergency recovery procedures for SIGKILL scenarios
  • Self-healing mechanisms when nodes rejoin the cluster
  • Cleanup of dead process registrations
17 Likes

Great work, it’s wonderful to see more consistent distributed libraries/systems being built in Elixir!

I think it’s very important to document exactly what this means. Unlike availability, it is actually possible to guarantee consistency 100% of the time, so that is often the more useful guarantee to make. If you are not making those guarantees it’s important that users know exactly when they will be making inconsistent reads/writes.

Of course this is a brand new project so I’m sure docs are still WIP. I did a quick scan through the code and it looks very nice, so again - great work :slight_smile:

1 Like

You’re absolutely right — that was definitely an oversight in the docs :sweat_smile:

I’ve updated the description to clarify that this doesn’t prioritize availability. That was definitely an inaccurate implication before.

1 Like

Seeing this phrase makes me twitch at this point. LLMs love it though. :003:

Library looks good, IMO it was time for a new player in the area. I too took a quick look in the code. Good impressions so far.

2 Likes

So many missed opportunities related to name :frowning:

1 Like

Like Ragistry, RAG, Rage, Draft, Dragon, Distra, and my personal fave Dijkstra (which is a full acronym btw, feel free to guess the 8 words behind)? :slight_smile:

1 Like

I was thinking more about something like Imhotep, CleopatRa, or stuff like BadRomance.

3 Likes

Fair; then again, the name is quite… clear. I do value that.

@eliasdarruda, thank you for this contribution to the ecosystem! <3

2 Likes