Hi everyone,
I’m working on my college thesis: Distributed BEAM Compute and Capability-Based Routing for the Nerves Platform. I’d
love feedback from people who’ve worked on distributed Elixir, Nerves, or P2P systems.
The Problem
Nerves devices are typically managed from a central server. I want to flip that — a network of embedded BEAM nodes that are fully autonomous, self-organising, and can route work to each other based on what they can do, not where they are.
The target environment is unreliable networks — construction sites, warehouses, field deployments — where devices come and go, may be behind NAT or CGNAT, and there’s no central infrastructure.
The Vision
The API should feel like native OTP. A node advertises what it has:
Network.start_advertising([cpu: 4, gpu: 2, storage: 1024])
Any other node can then route work to a capable peer transparently:
Spawn a process on any node with a camera
Network.spawn([gpu: true], fn → capture_image() end)
Run a task on any node with enough CPU, await the result
result = Network.Task.async([gpu: 2], fn → run_inference(data) end)
|> Task.await()
send/receive work normally across the mesh
pid = Network.spawn([storage: 512], fn →
receive do
{:write, data} → store(data)
end
end)
send(pid, {:write, payload})
The key goal: OTP works normally. Task, GenServer, send/receive, linking, monitoring — all transparent across the
mesh.
Functional Requirements
- Nodes auto-discover each other (zero config)
- Nodes advertise capabilities and can be found by capability query
- NAT and CGNAT traversal
- Sparse mesh — nodes gossip, not fully connected
- Tasks are scheduled, monitored, and fault-tolerant
- Standard OTP patterns work transparently across nodes
Current Architecture
After a lot of research, here’s where I’ve landed:
Discovery & Transport: libp2p (Rust via Port)
Elixir’s libp2p support is limited, so I’m running a Rust libp2p binary as an Erlang Port. It handles:
- mDNS for local zero-config discovery
- Kademlia DHT for discovery across subnets and NAT
- DCUTR hole-punching and relay fallback for CGNAT
- Communicates with Elixir over a JSON line protocol on stdin/stdout
Overlay Network: Partisan
Standard Erlang distribution (EPMD, full mesh) doesn’t fit — it assumes reliable networks and full connectivity.
Partisan is a drop-in replacement that supports:
- Configurable topologies (sparse, peer-to-peer, client-server)
- Works without EPMD
- Gossip-based membership via plumtree broadcast
When libp2p discovers a peer, Partisan connects to it. From that point, all BEAM-level messaging goes through
Partisan.
Capability Registry: Horde
Once nodes are connected via Partisan, capabilities need to be replicated across the cluster. Horde uses a
CRDT-based distributed registry that tolerates node churn well. Each node registers its capabilities into
Horde.Registry on startup; Network.resolve([camera: true]) queries it to find a matching node.
Scheduling: Horde.DynamicSupervisor
Network.Task.async routes to a node via the registry and spawns under a Horde.DynamicSupervisor. If the node dies
mid-task, Horde can restart it elsewhere.
The stack:
Application Layer
Network.spawn / Network.Task.async
↓
Capability Registry (Horde.Registry)
resolve([camera: true]) → node
↓
Overlay Transport (Partisan)
Node.spawn / :partisan_rpc
↓
Discovery & NAT (libp2p via Rust Port)
mDNS + KadDHT + DCUTR/Relay
Open Questions
- Horde + Partisan compatibility
Horde internally uses :pg and :erpc, which assume standard Erlang distribution. Partisan replaces the dist layer.
Has anyone successfully run Horde over Partisan, or is this a known incompatibility? Would I need to implement a
simpler CRDT registry directly over Partisan’s broadcast?
- KadDHT for capability registry vs. gossip
I’m currently leaning toward using Kademlia DHT (via the libp2p Rust bridge) only for bootstrap and peer discovery,
and then using Partisan’s plumtree broadcast for capability propagation once connected. Does this make sense, or is
there value in keeping capabilities in the DHT for nodes that aren’t yet connected?
- Rust Port vs. NIF
The Rust libp2p binary runs as an Erlang Port (stdin/stdout JSON). This is safe for embedded (a crashing NIF takes
down the BEAM), but adds latency and serialisation overhead. For a thesis prototype, Port seems right. Anyone have
experience with this trade-off on Nerves specifically?
- Task transparency
For send/receive to work across nodes, the remote PID needs to be routable back to the caller. With standard Erlang dist this is automatic. With Partisan — does forwarding remote PIDs work transparently, or does Partisan require
explicit addressing?
What I’m Not Doing (Yet)
- No libcluster — it assumes reliable networks and standard dist
- No central registry or broker
- No custom transport (relying on libp2p for that)
Things that I might consider, maybe do:
* Do BEAM Distribution over Libp2p streams maybe?
The Rust bridge, Partisan config, and a basic Network.spawn stub are all in place. The registry and task routing are what I’m building next.
Would love feedback on the architecture, especially the Horde/Partisan question and whether the Rust Port approach is sensible for Nerves. Thanks!






















