Porting Sakana AI's TRINITY Qwen-based model to Elixir/Bumblebee/Nx/Axon

gtcode · May 22, 2026, 8:49pm

I want to zoom out a bit and write down where I think this is heading architecturally, because the recent progress changes the nature of the problem.

The thread started as “can Sakana AI’s TRINITY/Qwen router be ported to Elixir/Bumblebee/Nx/Axon at all?” The answer is now much closer to yes than it was a few weeks ago. The thin-SVD work in Nx PR #1753 removes the worst memory pressure from export. We have CUDA as the original reference lane, EMLX getting a clean 37/37 decision-stable pass, and Emily now close enough that it has a real runtime profile with its own empirical margins. The remaining differences are the kind I would expect from backend-specific numeric behavior: route hashes drift, but the actual routing decisions can be made stable with per-profile fixtures and margin floors.

That is a big deal. The “can Nx express this pipeline?” question is mostly behind us. The harder question now is where this code should live before the bring-up repo hardens into the permanent monolith.

Right now trinity_coordinator is doing too many jobs. That was the right way to get the port working, but it is the wrong long-term shape.

It currently contains, in one application:

the TRINITY coordination semantics
the Sakana/Qwen manifest and routing-head invariants
Bumblebee model loading
Axon routing-head execution
Nx/SVD export and reconstruction mechanics
safetensors slicing
artifact fetching and pinning
provider-agent pool wiring
trace plumbing
operator Mix tasks
runtime profile and XLA/EXLA/EMLX/Emily bring-up logic

As a lab bench, that is useful. As a library boundary, it is too heavy.

The concrete split I am converging on is:

Product / AppKit
    |
    | governed product boundary
    v
Mezzanine / OuterBrain / Citadel
    |
    | semantic, workflow, authority, review, provenance
    v
trinity_framework
    |
    | Trinity.* contracts and coordinator semantics
    |
    +--> trinity_bridge_inference
    |       |
    |       v
    |     inference
    |     provider LLM calls
    |
    +--> trinity_bridge_self_hosted_inference
            |
            v
       self_hosted_inference_core
            |
            v
       self_hosted_inference_bumblebee
            |
            | Qwen load + Sakana adapter patch + routing head projection
            v
       RouteLogits

Reusable ML substrate underneath:

  crucible_safetensors
  crucible_tensor_patch
  crucible_factorization
  crucible_model_registry

The important point is that trinity_framework already exists and already has real Trinity.* contract modules. There is also already a consumer using it in stack_lab’s examples/trinity_platform_roundtrip. The workspace scaffold is there: core/, bridges/, apps/, tools/. Most of those package slots are stubs today, but the shape is right.

So I would avoid turning trinity_coordinator into a bigger framework. The cleaner next step is to finish the framework that already exists, move the reusable pieces to their real owners, and let trinity_coordinator become a temporary reference deployment while the migration is happening.

The main boundary correction is that the coordinator should never see the hidden vector.

An earlier version of the design had this rough flow:

Qwen runtime extracts hidden state
    -> returns HiddenVectorRef or hidden-state payload
trinity_sakana_pipeline applies routing head
    -> route decision

That split puts pressure in the wrong place. It either transfers a large tensor over a boundary or makes the coordinator orchestrate a runtime-local tensor operation through an opaque handle. Both are awkward. The extraction and the routing-head projection belong together inside the model runtime.

The cleaner flow is:

trinity_sakana_pipeline builds a plan
    -> artifact pin
    -> adapter ref
    -> selected tensor/head spec
    -> TRINITY shape invariants

self_hosted_inference_bumblebee executes the runtime work
    -> load Qwen
    -> apply/cache Sakana adapter
    -> extract penultimate state
    -> run Axon routing head
    -> return small RouteLogits struct

trinity_coordinator_core applies semantics
    -> role/agent partition
    -> Worker/Thinker/Verifier choice
    -> RouteDecision

That keeps tensors where tensors belong. The model runtime owns Bumblebee, Axon, Nx backend placement, and the routing-head math. The coordinator core owns what the logits mean.

The same idea applies to adapter identity. Sakana’s patch differs from LoRA in details, but operationally it behaves like an adapter: a content-addressed parameter-efficient modification that should be reused if the same patch is requested again. I would rather make that visible in the runtime lease model than hide it inside the Bumblebee backend as an ad hoc cache. self_hosted_inference_core should understand it at the lease level.

Concretely, SelfHostedInferenceCore.InstanceSpec should grow an optional adapter_ref, and RuntimeRegistry should be able to route by something like:

(backend_id, adapter_ref)

instead of only by backend/profile. That lets an already-warmed Qwen+Sakana-adapter runtime handle the next request with the same adapter hash. Without that, every distinct TRINITY patch plan risks becoming a fresh GPU process or a contested in-place mutation.

This is the part of the migration I expect to be hardest. Adding adapter_ref may be additive, but it may also expose assumptions in the existing Ollama backend or runtime registry about what “instance identity” means. I would wait for the real CUDA head_route gate to pass through the new backend before calling this a casual field addition.

The ML utility split is also important. The Nx PR #1753 point is a good example. One operation, Nx.LinAlg.svd, currently forces a git-SHA pin through the whole coordinator because export needs the better thin-SVD memory behavior. That tradeoff is fine in a bring-up repo. It becomes a problem when downstream consumers only want TRINITY routing semantics.

The SVD/SVF code should move to crucible_factorization. That one package can carry the Nx pin while it needs to. When the relevant Nx release lands on Hex, that package can relax the dependency. Other packages should be able to depend on a RouteDecision struct without inheriting a custom Nx commit.

The same split applies to the other ML mechanics:

crucible_safetensors
    lazy safetensors reads, row slicing, bounded chunk materialization

crucible_tensor_patch
    TensorPath, ParamTree, dtype/backend transfer policy, patch receipts

crucible_factorization
    SVD/SVF reconstruction, stage-check math, parity-report primitives

crucible_model_registry
    artifact pins, manifest hashes, HF/S3/local storage backends, lineage

One correction to my earlier thinking: I would fold this into crucible_model_registry instead of creating a new crucible_artifacts package. The registry already exists in the North-Shore-AI namespace and already has the richer storage-backend concept. The coordinator’s artifact pin schema should be merged into that. But that needs a real pre-flight audit. If the registry’s notion of artifact identity, SHA scope, revision refs, or cache layout differs from TrinityCoordinator.ArtifactFetch.Pin, then the work becomes schema reconciliation instead of a module rename.

Inside trinity_framework, I would split the TRINITY pieces roughly like this:

core/trinity_contracts
    DTOs and behaviours:
    RouteDecision, RouteLogits, AgentCallIntent, ProviderPool contracts,
    trace events, model-runtime behaviour, agent-caller behaviour.
    No Nx, no Axon, no Bumblebee, no Req, no HF.

core/trinity_sakana_contracts
    Sakana/Qwen schemas and invariants:
    manifest shape, router-head dimensions, export spec,
    selected tensor keys, profile specs.
    Still no runtime deps.

core/trinity_sakana_pipeline
    TRINITY-specific plan generation:
    what adapter/head/tensor plan should be executed.
    It returns a plan rather than a hidden vector or model handle.

core/trinity_coordinator_core
    the pure coordination state machine:
    run loop, role injector, thinker/verifier policy,
    route-logit interpretation, budget enforcement.
    It should be testable with mock behaviours and no real model.

bridges/trinity_bridge_self_hosted_inference
    translates Trinity.ModelRuntime calls into self_hosted_inference_core leases
    and self_hosted_inference_bumblebee.route_with_head/3.

bridges/trinity_bridge_inference
    provider LLM calls through the existing :inference package.

bridges/trinity_bridge_trace
    trace sinks and redaction policy.

tools/trinity_ops
    the Mix tasks currently under mix trinity.*

apps/trinity_single_node
    the production-shaped local deployment app.

That last package matters because the long-lived deployment surface should move out of trinity_coordinator. The app that wires the single-node experience should live under the trinity_framework workspace as apps/trinity_single_node. It should own the runtime config reads: XLA_TARGET, HF_TOKEN, provider keys, runtime profile selection, artifact dirs, and so on. Those values should be materialized at the app boundary and passed down as config, so lower libraries receive config instead of reaching into env directly.

During migration I would keep the old coordinator config authoritative until the new app proves equivalence. In practice that means running the same HITL head-route gate first against trinity_coordinator’s config/runtime.exs, then against trinity_single_node’s config/runtime.exs. Only after both pass should the original config be retired.

There are also some boring but necessary enforcement rules:

No System.get_env under lib/**
    Runtime/deployment env reads stay in config/runtime.exs or app bring-up.

No Nx/Axon/Bumblebee in contracts
    Contract packages must remain lightweight and CPU/node friendly.

No direct SelfHostedInferenceCore/Bumblebee/Crucible imports in core packages
    Those imports belong in bridges, runtimes, or ML utility packages.

No product bypass
    Product code should go through AppKit/Mezzanine/Trinity surfaces,
    not directly into lower runtime or ML packages.

The last point ties this back to the older governed execution platform thread:

That thread is out of date in details, but the boundary argument still applies. The larger stack has separate places for semantic reasoning, authority, workflow, lower facts, traces, review, and execution because those are different kinds of truth. TRINITY should plug into that shape rather than bypass it.

In that larger picture:

app_kit is the product/operator surface.
mezzanine owns workflow, leases, installation context, review, promotion, and governed reads.
outer_brain owns semantic context assembly and provider-facing semantic validation.
citadel owns authority and advisory planning context.
jido_integration and execution_plane own lower facts and runtime effects.
self_hosted_inference_core owns the local model-runtime lease model.
trinity_framework owns the TRINITY-specific contracts and coordination semantics.
The crucible_safetensors, crucible_tensor_patch, crucible_factorization, and crucible_model_registry packages own reusable ML mechanics.

So the decomposition goes beyond moving files into smaller repos. It makes the TRINITY port obey the same truth boundaries as the rest of the platform.

The proposed phase order is intentionally conservative:

0. Freeze the monolith baseline.
1. Extract crucible_safetensors first, as the smallest proof of the pattern.
2. Extract crucible_factorization and move the Nx #1753 pin there.
3. Extract crucible_tensor_patch and merge artifact pins into crucible_model_registry.
4. Build self_hosted_inference_bumblebee and prove route_with_head/3 on CUDA.
5. Fill out trinity_framework core and bridges.
6. Move operator Mix tasks into tools/trinity_ops.
7. Make apps/trinity_single_node the deployment app.
8. Reduce trinity_coordinator to a compatibility shim.
9. Archive it after a release window.

The file audit gives this some teeth. The coordinator has about 60 .ex files under lib/, about 10.5 KLOC total. Roughly 40% belongs in trinity_framework, 30% in self_hosted_inference_bumblebee plus the North-Shore-AI crucible_* packages, 25% in ops/tasks, and a small amount should be deleted because provider transport already belongs in inference.

Two examples show the logic:

TrinityCoordinator.Sakana.Artifact is currently three owners in one file: manifest validation, artifact fetch/pin semantics, and model-state patching. Those should become:

manifest validation       -> trinity_sakana_contracts
artifact pin/fetch/cache  -> crucible_model_registry
model-state patching      -> self_hosted_inference_bumblebee + crucible_tensor_patch

TrinityCoordinator.CoordinationHead is also split in two:

role/agent/head invariants -> trinity_sakana_contracts
Axon model and projection  -> self_hosted_inference_bumblebee

That split is the whole design in miniature: contracts define what is valid, runtime executes tensor work, coordinator interprets small semantic results.

I also want to preserve the git history when this moves. The history on files like sakana/svd.ex, coordination_head.ex, and the Sakana loader/patch code contains a lot of architectural decision-making that will not be obvious from the final file layout. Before trinity_coordinator is retired, I would subtree-merge or otherwise import the relevant history into trinity_framework.

My current summary would be:

trinity_coordinator was the right place to get the port working. The permanent library boundary should move elsewhere.

The permanent API should be Trinity.* in trinity_framework.

The permanent model runtime should be self_hosted_inference_bumblebee behind self_hosted_inference_core.

The reusable ML pieces should become North-Shore-AI crucible_* packages, with the existing crucible_model_registry extended rather than duplicated.

The single-node app should live as trinity_framework’s apps/trinity_single_node.

And the old coordinator repo should become a temporary compatibility shim, then retire.

If this sounds like a lot of decomposition for one model port, I think the recent backend work is exactly why it is justified. The port now spans CUDA, EMLX, and Emily lanes; it carries backend-specific profile behavior; it depends on a very specific Nx SVD improvement; and it is starting to look useful beyond the initial experiment. That is the point where “just keep it in the bring-up repo” starts creating future debt.

The goal is to avoid abstraction for its own sake and give each part one honest owner:

contracts describe
coordinator semantics decide
runtime executes tensor work
crucible packages do reusable ML mechanics
ops tasks operate
single-node app configures
AppKit/Mezzanine/OuterBrain/Citadel/Jido keep the governed platform honest

Very much open to any feedback!