Ra_registry — A Distributed Registry based on Raft using rabbitmq/ra

eliasdarruda · April 13, 2025, 5:38am

Introduction

Introducing RaRegistry: A Distributed, Raft-Backed Process Registry for Elixir

If you’re building distributed Elixir applications and need a reliable, consistent process registry, RaRegistry is a new library worth exploring. It offers a drop-in alternative to Elixir’s built-in Registry, but with distributed consensus powered by Ra, RabbitMQ’s implementation of the Raft protocol.

Features:

Support for both :unique and :duplicate registration modes
Automatic process monitoring and cleanup
Built on Ra, RabbitMQ’s implementation of the Raft consensus protocol
Regular operations with strong consistency during normal cluster operation
Familiar API similar to Elixir’s built-in Registry
Enhanced recovery mechanisms for handling abrupt node down scenarios like SIGKILL
Seamless integration with GenServer via the :via tuple registration

Usage

Dependency

def deps do
  [
    {:ra_registry, "~> 0.1.2"}
  ]
end

Example implementation

defmodule MyApp do
  # Add RaRegistry to your application supervision tree
  def start(_type, _args) do
    children = [
      # Start RaRegistry after any kind of node discovery mechanism such as libcluster
      # You can configure any configuration related with the :ra cluster under ra_config.
      # wait for nodes range ms is a random range between two milliseconds values to ensure nodes are properly connected
      {
        RaRegistry,
        keys: :unique, # or :duplicate
        name: MyApp.Registry,
        ra_config: %{data_dir: ~c"/tmp/ra"},
        wait_for_nodes_range_ms: 3000..5000
      },
      
      # Other children in your supervision tree...
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

defmodule MyApp.Server do
  use GenServer
  
  def start_link(opts) do
    GenServer.start_link(__MODULE__, [], name: {:via, RaRegistry, {MyApp.Registry, opts[:id]}})
  end
  
  def call(id, message) do
    GenServer.call(via_tuple(id), message)
  end

  defp via_tuple(id), do: {:via, RaRegistry, {MyApp.Registry, id}}
  
  # GenServer implementation
  def init(state), do: {:ok, state}
  def handle_call(:ping, _from, state), do: {:reply, :pong, state}
  def handle_call({:get, key}, _from, state), do: {:reply, Map.get(state, key), state}
  def handle_call({:set, key, value}, _from, state), do: {:reply, :ok, Map.put(state, key, value)}
end

# Then, in your application code:
{:ok, pid} = MyApp.Server.start_link(id: "user_123")

# This call will work from any node in the cluster
MyApp.Server.call("user_123", {:set, :name, "John"})
MyApp.Server.call("user_123", {:get, :name}) # => "John"

# Should return already started regardless of the node you try to start the Server
{:error, {:already_started, ^pid}} = MyApp.Server.start_link(id: "user_123")

Consistency and Recovery

Consistency Model

RaRegistry offers these consistency guarantees:

Normal Operation: Operations use the Raft consensus protocol via Ra, providing strong consistency when a majority of nodes are available
State Machine Atomicity: Operations within the Ra state machine are atomic and either fully succeed or have no effect
Best-Effort Recovery: During failure scenarios like SIGKILL of the leader, our implementation employs aggressive recovery mechanisms that prioritize cluster recovery

It’s important to understand that:

The custom recovery mechanisms we’ve implemented extend beyond the standard Raft protocol
After recovery, the system returns to a consistent state, though some in-flight operations might result in errors due to incomplete execution

Recovery Capabilities

RaRegistry includes specialized recovery mechanisms to handle various failure scenarios:

Automatic leader election after clean node failures
Emergency recovery procedures for SIGKILL scenarios
Self-healing mechanisms when nodes rejoin the cluster
Cleanup of dead process registrations

garrison · April 13, 2025, 5:12pm

Great work, it’s wonderful to see more consistent distributed libraries/systems being built in Elixir!

I think it’s very important to document exactly what this means. Unlike availability, it is actually possible to guarantee consistency 100% of the time, so that is often the more useful guarantee to make. If you are not making those guarantees it’s important that users know exactly when they will be making inconsistent reads/writes.

Of course this is a brand new project so I’m sure docs are still WIP. I did a quick scan through the code and it looks very nice, so again - great work

eliasdarruda · April 13, 2025, 5:54pm

You’re absolutely right — that was definitely an oversight in the docs

I’ve updated the description to clarify that this doesn’t prioritize availability. That was definitely an inaccurate implication before.

dimitarvp · April 13, 2025, 7:41pm

Seeing this phrase makes me twitch at this point. LLMs love it though.

Library looks good, IMO it was time for a new player in the area. I too took a quick look in the code. Good impressions so far.

hauleth · April 14, 2025, 8:45am

So many missed opportunities related to name

mudasobwa · April 14, 2025, 12:23pm

Like Ragistry, RAG, Rage, Draft, Dragon, Distra, and my personal fave Dijkstra (which is a full acronym btw, feel free to guess the 8 words behind)?

hauleth · April 14, 2025, 3:33pm

I was thinking more about something like Imhotep, CleopatRa, or stuff like BadRomance.

nulltree · April 14, 2025, 9:05pm

Fair; then again, the name is quite… clear. I do value that.

@eliasdarruda, thank you for this contribution to the ecosystem! <3