Architecture Feedback: A Governed Execution Platform for AI Workloads on the BEAM

gtcode · April 18, 2026, 1:33am

Hi everyone,

I have been working on a large-scale, open-source platform for orchestrating and governing AI workloads on the BEAM (currently spanning various repositories under North-Shore-AI / nshkrdotcom).

Because AI agents are inherently non-deterministic and stateful, I wanted to build a system where governance precedes execution. Rather than letting LLMs directly execute code or API calls in a wild-west loop, every semantic turn, typed command, and external execution call passes through a strict, shared governance and durable-truth chain before it touches an outside system.

Much of the core idea, particularly the work in citadel came from work done by Mike Hostetler and his team. The jido_integration package was also created per Mike’s direction. I’m grateful to the Jido team for that opportunity.

In order to work on such a large scale endeavor, I’ve created a few helper projects to deal with cumbersome monorepos: blitz and weld. Blitz makes it easy to conduct monorepo-wide quality checks, and Weld will be used to assemble the various mix.exs projects in a monorepo into a single mix.exs project, combining the test suites and code, while also assembling shared documentation. Welding these monorepos will simplify publishing to Hex.

As the architecture progresses, I’d love to get some eyes on structural boundaries and OTP design philosophy from folks here who have distributed systems and BEAM expertise. I’m using this architecture to learn distributed systems design and implementation in practice. Constructive criticism is welcomed, especially since this is still early stage greenfield development.

High-Level Architecture

The platform is split into strict, structural ownership boundaries. No layer is allowed to bypass the layer below it.


[ Products / UIs / Workbenches ]

↓

[ Northbound Surface ]

* app_kit (Product-facing entry point)

* mezzanine (Universal business machine & operational-semantic engine)

↓

[ Brain Chain ] (Semantic → Typed → Governed)

* outer_brain (Semantic reasoning)

* citadel (Governance, policy, and intent shaping)

↓

[ The Spine ] (Durable Truth)

* jido_integration (Durable intake, auth, control plane, review truth)

↓

[ Execution Substrate ] (The "Hazmat Zone")

* execution_plane (Transport, placement, sandboxing, raw facts)

* Family Kits (REST, GraphQL, CLI/Subprocess, Python/Snakepit)

[ Foundational Substrate ] (Underlies everything)

* ground_plane (IDs, fences, leases, checkpoints, generic persistence)

Core BEAM / OTP Principles

To avoid turning this into an unmanageable “process soup,” I’ve adopted a few hard rules:

Strict Structural Ownership: The Execution Substrate handles transport and placement, but carries zero durable business meaning. The Spine owns durable truth. The Brain shapes intent.
Data & Contracts over Processes: Define stable data, pure compilers, reducers, and projectors first. Only wrap things in OTP processes (GenServers, Supervisors) where fault-tolerance, state recovery, or concurrency explicitly require them.
Traceability by Design: Because debugging AI is like forensic analysis on a dream, lineage and audit trails are first-class. Every action maps back to a governed decision.

Where I’d Love Your Input

While I am confident in the functional separation, distributed orchestration introduces specific challenges where I’d value expert insight from the community. LLM’s can certainly provide guidance on how to approach the design, tooling, and testing, but nothing beats feedback from a distributed systems engineer.

1. Distributed State & “The Spine”

jido_integration acts as the durable truth layer (intake, auth, control plane).

2. The Execution Substrate (“Hazmat Zone”)

The execution_plane isolates dangerous work (subprocess execution, Python bridges, API calls). I want to ensure that failures here cascade cleanly up to the Spine without taking down the governance layers

3. “Pure Core” vs OTP Overhead

I’m actively trying to keep business logic (in mezzanine and citadel) as pure functions, pushing side-effects to the edges. However, orchestrating multi-step LLM reasoning loops often requires suspending and resuming state.

4. Distributed Test Harness

stack_lab will be used to enable testing distributed systems on my local development machine and test environments, not as a general framework but specifically for this stack.

5. General Architectural Critique

Looking at the layer diagram above, do you spot any glaring anti-patterns or bottlenecks typical to BEAM distributed systems?

If you are interested in poking around the code, the core infrastructure lives across repos at nshkrdotcom. (Specific repos of interest might be execution_plane, citadel, and jido_integration).

While I am ultimately interested in ensuring the correctness of the code in granular detail, currently it’s expected that the code will have anti-patterns. These will be naturally resolved over time. So, I’m aware of this, but not looking for specific feedback on the minutia at this time, unless broadly relevant.

Here’s a whitepaper that describes the architecture and roadmap in more detail. The code on github is a bit out of date, pending completion/push of current phase efforts.

My goal is to nail down the big picture architecture, after which I’m willing to revamp/rebuild any library as needed. I’m willing to revisit the overall architectural boundaries if there’s a good reason. Thanks in advance for any insights, critiques, or war stories you’re willing to share!