cheekyfinder

Looking for feedback on my thesis project: Distributed BEAM Compute & Capability-Based Routing for Nerves

Hi everyone,

I’m working on my college thesis: Distributed BEAM Compute and Capability-Based Routing for the Nerves Platform. I’d
love feedback from people who’ve worked on distributed Elixir, Nerves, or P2P systems.

The Problem

Nerves devices are typically managed from a central server. I want to flip that — a network of embedded BEAM nodes that are fully autonomous, self-organising, and can route work to each other based on what they can do, not where they are.

The target environment is unreliable networks — construction sites, warehouses, field deployments — where devices come and go, may be behind NAT or CGNAT, and there’s no central infrastructure.

The Vision

The API should feel like native OTP. A node advertises what it has:

Network.start_advertising([cpu: 4, gpu: 2, storage: 1024])

Any other node can then route work to a capable peer transparently:

Spawn a process on any node with a camera

Network.spawn([gpu: true], fn → capture_image() end)

Run a task on any node with enough CPU, await the result

result = Network.Task.async([gpu: 2], fn → run_inference(data) end)
|> Task.await()

send/receive work normally across the mesh


pid = Network.spawn([storage: 512], fn → 
receive do
{:write, data} → store(data)
end
end)
send(pid, {:write, payload})

The key goal: OTP works normally. Task, GenServer, send/receive, linking, monitoring — all transparent across the
mesh.

Functional Requirements

Nodes auto-discover each other (zero config)
Nodes advertise capabilities and can be found by capability query
NAT and CGNAT traversal
Sparse mesh — nodes gossip, not fully connected
Tasks are scheduled, monitored, and fault-tolerant
Standard OTP patterns work transparently across nodes

Current Architecture

After a lot of research, here’s where I’ve landed:

Discovery & Transport: libp2p (Rust via Port)

Elixir’s libp2p support is limited, so I’m running a Rust libp2p binary as an Erlang Port. It handles:

mDNS for local zero-config discovery
Kademlia DHT for discovery across subnets and NAT
DCUTR hole-punching and relay fallback for CGNAT
Communicates with Elixir over a JSON line protocol on stdin/stdout

Overlay Network: Partisan

Standard Erlang distribution (EPMD, full mesh) doesn’t fit — it assumes reliable networks and full connectivity.
Partisan is a drop-in replacement that supports:

Configurable topologies (sparse, peer-to-peer, client-server)
Works without EPMD
Gossip-based membership via plumtree broadcast

When libp2p discovers a peer, Partisan connects to it. From that point, all BEAM-level messaging goes through
Partisan.

Capability Registry: Horde

Once nodes are connected via Partisan, capabilities need to be replicated across the cluster. Horde uses a
CRDT-based distributed registry that tolerates node churn well. Each node registers its capabilities into
Horde.Registry on startup; Network.resolve([camera: true]) queries it to find a matching node.

Scheduling: Horde.DynamicSupervisor

Network.Task.async routes to a node via the registry and spawns under a Horde.DynamicSupervisor. If the node dies
mid-task, Horde can restart it elsewhere.

The stack:

Application Layer
Network.spawn / Network.Task.async
↓
Capability Registry (Horde.Registry)
resolve([camera: true]) → node
↓
Overlay Transport (Partisan)
Node.spawn / :partisan_rpc
↓
Discovery & NAT (libp2p via Rust Port)
mDNS + KadDHT + DCUTR/Relay

Open Questions

Horde + Partisan compatibility

Horde internally uses :pg and :erpc, which assume standard Erlang distribution. Partisan replaces the dist layer.
Has anyone successfully run Horde over Partisan, or is this a known incompatibility? Would I need to implement a
simpler CRDT registry directly over Partisan’s broadcast?

KadDHT for capability registry vs. gossip

I’m currently leaning toward using Kademlia DHT (via the libp2p Rust bridge) only for bootstrap and peer discovery,
and then using Partisan’s plumtree broadcast for capability propagation once connected. Does this make sense, or is
there value in keeping capabilities in the DHT for nodes that aren’t yet connected?

Rust Port vs. NIF

The Rust libp2p binary runs as an Erlang Port (stdin/stdout JSON). This is safe for embedded (a crashing NIF takes
down the BEAM), but adds latency and serialisation overhead. For a thesis prototype, Port seems right. Anyone have
experience with this trade-off on Nerves specifically?

Task transparency

For send/receive to work across nodes, the remote PID needs to be routable back to the caller. With standard Erlang dist this is automatic. With Partisan — does forwarding remote PIDs work transparently, or does Partisan require
explicit addressing?

What I’m Not Doing (Yet)

No libcluster — it assumes reliable networks and standard dist
No central registry or broker
No custom transport (relying on libp2p for that)

Things that I might consider, maybe do:
* Do BEAM Distribution over Libp2p streams maybe?

The Rust bridge, Partisan config, and a basic Network.spawn stub are all in place. The registry and task routing are what I’m building next.

Would love feedback on the architecture, especially the Horde/Partisan question and whether the Rust Port approach is sensible for Nerves. Thanks!

7 comments

/nerves #distribution #epmdless #partisan #distributed-nodes

6 434 7

2026-04-07 01:04:09 UTC

Most Liked

mudasobwa

Creator of Cure

I have zero experience with nerves, unfortunately, but I just wanted to endorse you with this project. It looks very interesting and important. All the good luck in your journey!

Post #2

Vidar

This would be outside of what I usually do, so I figured it would be a good exercise for my Claude skills.

So, as you will do the port implemention, here is a Elixir NIF wrapper for the Rust libp2p library. It might be handy for comparison or whatever else you might find useful. I put a little OTP application layer on top while I was at it.

The wrapper architecture with some added info.

The Elixir libp2p Rust wrapper

Post #8

Where Next?

View thread on forum (has 7 responses!)

nerves

distribution

epmdless

partisan

distributed-nodes

Home Chat & Discussions>Discussions

/nerves #distribution #epmdless #partisan #distributed-nodes

11 434 7

Last post

Looking for feedback on my thesis project: Distributed BEAM Compute & Capability-Based Routing for Nerves

cheekyfinder

Looking for feedback on my thesis project: Distributed BEAM Compute & Capability-Based Routing for Nerves

Spawn a process on any node with a camera

Run a task on any node with enough CPU, await the result

send/receive work normally across the mesh

Most Liked

mudasobwa

Vidar

Where Next?

Popular in Discussions

Elixir Code Editors & IDEs - which one are you using? (Poll)

Style: cleanest way to pipe to an Enum.map with an anonymous function

Running the new Elixir formatter

What are the caching strategies in Phoenix?

Does my frustration with Node merit switching to Elixir?

Background job queues: When to use? When not to use? Which one to use?

Phoenix LiveView is now... live!

Other popular topics

Import a module from a file into IEX

Visual Studio Code - how to highlight html closing tags in html.eex?

Params in the URL and body -- how does Phoenix handle them together?

Checking if an enum is empty - Credo vs Compiler

Hex version - ** (Mix) The task "phx.new" could not be found

What's a great modern drag and drop javascript library you recommend?

Latest Nerves Threads

Chat & Discussions>Discussions

Latest on Elixir Forum

Sponsor Spotlight

Our Sponsors

Categories:

Sub Categories:

Forums

Popular Tags

Our Sponsors

We're in Beta