1. Lifetime frame?
Yes — but not “model every byte forever.” It’s the long-term frame for load-bearing facts: boundaries, contracts, capabilities, evidence, lineage. Transient stuff (failed attempts, local patches) lives in lineage storage, not the main design surface.
2. Three loops — got it right?
Pretty much. One refinement on the middle loop: it’s not just size optimization. Size is one signal. The real goal is minimizing engineering cost while keeping behavioral, architectural, and evidence constraints intact. A smaller implementation that breaks an invariant is invalid.
On budget: it’s per SpecCell type, not global. A pure domain module gets a tight budget; a stateful process gets more room because it needs it.
If the solution can’t fit the budget, that’s a design signal, not a fatal error. Three outcomes: re-budget (the cell was under-budgeted), split (it’s too large), or redesign (the approach is over-mechanized).
3. Single source of truth, event-sourced?
Yes. The cleanest model is event-sourced at the semantic-fact level — spec asserted, invariant violated, normalizer applied, exception approved, etc. From that stream you can materialize the spec graph, implementation graph, evidence coverage, lineage traces, slop reports, whatever you need. Code stays the executable reality; the graph is the engineering truth.
4. Some LineageGraph concepts are general across projects?
Correct. Better to split it: a project-specific LineageGraph (cells, patches, exceptions, decisions) and a reusable Harness Doctrine Graph (skills, rules, detectors, normalizers, failure patterns). The reusable layer becomes your cross-project improvement dataset.
5. Good dataset for improvement?
Yes, and judgment traces are the most valuable output besides accepted code. A trace captures rejected designs, reasons for rejection, normalizer effects, and the accepted normal form — far richer than a normal commit. That’s what makes it useful for improving context bundles, rules, normalizers, repair classifiers, and model selection.
6. Skill-centered, no agents for planning/review — compatible?
Fully compatible. The architecture doesn’t need agent swarms. spec.audit, spec.bundle, spec.accept are just bounded operators — skills. The substrate owns authority, state, acceptance, and lineage. The LM fills a constrained hole; it doesn’t own the plan or verdict. The doc should probably say “bounded proposal operators” instead of making it sound agent-heavy.
7. Graph is true representation, changes are gated?
Yes, with one nuance: code is still a reality source. Brownfield or handwritten code can reveal the graph is incomplete. A graph mismatch doesn’t always mean “reject” — sometimes it means “the graph was missing a legitimate fact.” The important thing is drift can never silently merge.
8. Adversarial challenge?
Yes. The adversary attacks assumptions, not just code — can this invariant be bypassed? Can this credential leak through logs? Can this state transition happen out of order? Good adversarial findings get promoted into deterministic rules or property tests whenever possible.
9. Stable style/pattern guide vs. dynamic ENF?
You’re right that stability is a key quality. The resolution is layering ENF: a stable core that rarely changes, project policy that only changes via ADRs, experimental rules as warnings only, and explicit scoped exceptions. Living shouldn’t mean moving goalposts — it means stable doctrine plus evidence-driven exceptions and promotions.
10. Hyperparameter search on non-deterministic fuzzy processes — too many iterations?
Yes, if done naively. The system shouldn’t do broad HPO over fuzzy LLM judgments. Harness evolution should mostly tune deterministic or semi-deterministic things: context bundle contents, operator ordering, cost weights, normalizer selection, model choice by task class. Search should be small, cached, off the critical path, and judged by concrete metrics — fewer ENF violations, smaller implementation graph, more mutants killed, lower human review defects. LLM-as-judge can propose hypotheses but shouldn’t be the verdict engine.
11. Static analysis and tests don’t fix issues, they locate them — how does repair actually work?
Exactly right. Static analysis and tests produce evidence and counterexamples; they don’t fix anything. The repair loop is: detector finds violation → classify it → rebuild context bundle with the failure and allowed repair scope → bounded operator proposes a patch → same detector must pass → implicated invariants must pass. LLM proposes the repair, harness verifies it, mutation/adversary tests prevent shallow gaming. The resolved claim isn’t trusted until checked by deterministic evidence.
12. Lost here.
That section (AccessGraph) needs better explanation. Short version: it’s one substrate primitive that answers who may read/modify/execute/delegate anything — code, credentials, agent scope, all of it. The key distinction is read broad context, modify narrow scope, escalate for architecture changes.
13. Open, fully declared context initialization including hidden attributes, easy to spin up different modes?
Yes, that’s exactly the intent. Task intent, SpecCell, capability bundle, model settings, allowed files, forbidden actions, runtime assumptions, tool permissions, hidden harness defaults, cost budgets, trust zone — all declared. Hidden attributes should be harness-controlled, not undocumented ambient behavior. Enables easy mode switching: local-dev, strict CI, security-critical, brownfield audit, etc.
14. Granular credential setup makes sense?
Yes. The central object isn’t the raw secret — it’s an auditable, scoped, non-exportable lease. The agent holds a reference to the lease; only the trusted connector redeems it at the final effect boundary. Granularity lets you express exactly which connector can redeem, enforce expiry and revocation, and guarantee secrets never appear in logs or telemetry.
15. Nothing to say about code intelligence, but graphs and code seem like two sides of the same coin?
Your intuition is right. The key distinction is authority. Most code intelligence systems build a graph from code and treat it as a cache (code → graph). This inverts it: the graph is the source of truth, code is a projection. Visual programming and code are both projections of the same underlying structure — different projections are easier to work with for different kinds of work.
16. Living documents only good if automated. Granularity is unclear?
Agreed on both. SpecCells should exist at multiple levels — system, subsystem, component, operation, code-change — depending on risk and change surface. Don’t SpecCell every helper; do SpecCell load-bearing units: public APIs, capability boundaries, process lifecycles, external effects, credentialed operations. And yes, if humans have to manually update everything, the system fails. Automation is the requirement, not a nice-to-have.
Your existing loop (setup → test/implement/test → analysis/review → feedback) maps directly onto the three nested loops. The main difference is the living substrate stores and operationalizes the feedback instead of leaving it as informal skill memory. Complexity is real — ease of use and code output quality are the right things to optimize for.