A Structural Theory of Harnesses

Why Generalized Intelligence Lives in the Arrangement Around the Generator

April 2026 · Phillip Clapham

DOI: 10.5281/zenodo.19570642

Abstract

The AI industry has, over the last eighteen months, shifted the center of its engineering investment from scaling base models to building the scaffolding around them. The shift has a name now — harness engineering — given to it publicly by the Claude Code leak of March 2026 and formalized as a discipline within weeks by Red Hat, LangChain, Letta, AlphaSignal, and the career market. What the practice does not yet have is a theoretical account of why the engineering it has chosen to invest in is the only kind of engineering that could have worked.

This paper provides that account. A harness is defined functionally rather than substantively: a harness is the arrangement that closes the feedback loop between a generator and reality beyond the generator’s training distribution. Under this definition, the distinction between a harness and a generator is not about substance — harnesses are typically built from generators, and in biological cognition the harness components are stochastic prediction machines like the base generator they constrain — but about arrangement. What makes an arrangement a harness is that it forces a target generator into contact with experience the generator could not reach on its own and uses that contact to update the generator’s behavior in ways the generator cannot update itself. The arrangement must include at least one non-stochastic interface to reality — a deterministic I/O surface, a sensor, a user input, a tool call whose result is computed rather than sampled — because a loop composed purely of stochastic components is not a loop with reality at all. Harness properties (memory, identity, deliberation, affective state tracking, social competence, immune response against confabulation) are each a specific kind of feedback-loop closure — a specific channel through which a stochastic generator contacts the reality it did not memorize. Generalized intelligence — the capacity to transfer, accumulate, and coherently update across tasks and domains — is a property of the arrangement that closes these loops. Raw generators are narrowly intelligent within their training distribution on their own; generalization is the harness’s job.

The paper develops the theory in six moves. It opens with the functional definition and dissolves the “generators all the way down” objection by making arrangement rather than substance the load-bearing concept. It assembles convergent evidence from biological cognition — dual process theory, the default mode network, dementia progression, predictive processing, embodied cognition — establishing that the harness account is already the consensus position in cognitive neuroscience, though substrate chauvinism has kept it out of the AI debate. It documents the frontier AI industry’s revealed preference for harness-first engineering through what the labs actually ship. It walks a mechanism catalog of six harness properties as feedback-loop channels, with failure modes predictable from which loops are missing. It catalogs three production failures (sycophancy amplification, memory-system collapse, psychosis scaffolding) as empirical validation that the theory predicts the failure modes that occur. And it offers an open-source agent memory architecture, anneal-memory, as a constructive existence proof that harness properties at the memory layer can be specified precisely, built as working code, and deployed today.

Three consequences of the theory are developed in later sections. Generalization requires embodiment, real-world feedback loops, and cost functions — all harness properties that raw generators structurally lack — which makes AGI a harness engineering problem rather than a model scaling problem. Substantive alignment is what identity-constrained behavior looks like from the outside when the identity has formed under appropriate conditions — what experience accumulated, what selection criteria operated, what cost functions shaped what got valued. Identity is an emergent property of memory plus selection over time, and the formation conditions are themselves harness properties, so alignment lives at the harness layer rather than the weights layer, which is why RLHF alignment is observably shallow. And cognitive sovereignty is a side effect of harness engineering quality, architecturally distinguishable from substitution-harness design on a clean technical axis.

The paper also reports a methodology failure that occurred during its own drafting — a citation fabrication produced by an automated research pipeline and caught by the same kind of verification architecture the thesis argues is necessary — and preserves the incident as a compressed exhibit of the theory operating on itself. The paper closes by situating the AI alignment debate as one instance of a general civilizational coherence crisis in which declared institutional states and operational reality are bifurcating, and the infrastructure that would verify the gap is exactly what is contested or absent. Trustable cognition of any kind — biological, digital, institutional — requires verification infrastructure at the layer where cognition occurs. This paper provides that infrastructure for the agent-memory case. The pattern generalizes.


1. The Harness Moment

During the week this paper was written, “harness engineering” got named as a discipline. Not by academic consensus. By revealed preference across the AI industry, happening fast enough that the career market responded before the literature did.

The proximate cause was the Claude Code leak on March 30, 2026, when an npm packaging error shipped Anthropic’s production agent scaffolding in an unobfuscated TypeScript form — roughly half a million lines of code, mirrored to GitHub within hours and confirmed authentic. The leak did not reveal a thin wrapper around a magic model. It revealed massive, elaborate infrastructure: memory systems, feature flags, multi-agent coordination logic, context management, tool orchestration, session state, prompt-injection defenses, context-window management. The scaffolding was the substantive artifact. Claude Code’s capability was not the model alone — the model was already available through Anthropic’s public API. The capability was the harness engineering wrapped around it. Five hundred thousand lines of scaffolding is not an implementation detail. It is, as the phrase now goes, the intelligence.

Within a week, Red Hat published “harness engineering” as a named technical discipline within its AI engineering practice. Harrison Chase at LangChain blogged Your harness, your memory, with a load-bearing quote from Sarah Wooders of Letta: “asking to plug memory into an agent harness is like asking to plug driving into a car. Managing context, and therefore memory, is a core capability and responsibility of the agent harness.” AlphaSignal’s April 12 Sunday deep dive ran the conclusion directly: “the technical and economic moat in AI is shifting to harness engineering.” SPEQD posted a job for a Founding Harness Engineer. The vocabulary crystallized in real time, across the commercial layer, the infrastructure layer, and the hiring layer, in the space of fourteen days.

What the practitioners naming the practice have not yet done is explain it. They describe an engineering phenomenon that has become too important to ignore. They do not explain why intelligence would live in the scaffolding at all. The argument is circled, not stated. This paper writes the account the practice is waiting for.

The account is structural. Raw stochastic generators — LLMs included, but also the default mode network in the human brain — produce narrow intelligence on their own: useful local output within the patterns they have learned. What they do not produce is generalized intelligence, the kind that transfers across tasks, accumulates over time, and coherently updates against reality beyond what the generator has already memorized. Generalized intelligence is a property of the harness around the generator — where “harness” is a specific, operationally definable thing: the arrangement that closes the feedback loop between a generator and reality beyond its training distribution. Memory, identity, deliberation, reasoning, social competence, affective state tracking, and immune response to confabulation are harness properties, and each one is a specific kind of loop closure. The labs shipping harness-first products are not making a pragmatic compromise because models are insufficient. They are making the only move that works, because the work that produces generalized intelligence — the work that makes competence transfer, accumulate, and update against reality — structurally has to happen at the harness layer. It cannot happen inside the generator alone, in any substrate we have direct evidence for.

The paper provides the theoretical account in six moves. First, a conceptual setup: the 2021 Stochastic Parrots critique was right about the observation — LLMs are stochastic pattern matchers at the base layer — and the implication typically drawn from it is the opposite of what follows once the observation is taken seriously across substrates. Second, a functional definition of harness that replaces the common substantive definition (“not-a-generator”) with a definition in terms of what a harness actually does, and that resolves the “generators all the way down” objection by making arrangement rather than substance the load-bearing concept. Third, convergent evidence from biological cognition research that the harness account is already the consensus position in cognitive neuroscience, though substrate chauvinism has kept it out of the AI debate. Fourth, convergent evidence from the frontier AI industry’s revealed preference for harness-first engineering in what the labs actually ship. Fifth, a mechanism catalog of six harness properties treated as feedback-loop channels, with production failures predicted by which loops are missing and validated against three documented failure modes. Sixth, an open-source agent memory architecture — anneal-memory — as a constructive existence proof that harness properties at the memory layer can be specified precisely, built as working code, and deployed today.

Three consequences of the theory are developed in later sections. Generalization, and therefore AGI, is structurally a harness engineering problem rather than a model scaling problem. Alignment lives at the harness layer rather than the weights layer, which is why RLHF alignment is observably shallow and why the frontier labs’ public safety work is migrating toward harness-level mechanisms. And cognitive sovereignty is a side effect of harness engineering quality, architecturally distinguishable from substitution-harness design on a clean technical axis. These consequences are treated as consequences, not as the paper’s primary subject; the primary subject is the theory itself, and the consequences exist because the theory makes them structurally unavoidable.

And the paper reports, honestly, a methodology failure that occurred during its own drafting: a citation fabrication produced by an automated research agent and caught by the same kind of verification architecture the thesis argues is necessary. The failure is preserved not as an embarrassment but as a compressed exhibit of the thesis operating on itself. The paper about harness-over-generator cognition was produced by a harness-over-generator system, and the harness caught an error the generator could not have caught on its own. This is mentioned not because meta-cuteness is a virtue but because it is a small, auditable, real-time example of the structural claim in operation.

The paper closes by situating the AI alignment debate as one instance of a larger pattern. Across domains — geopolitics, enterprise adoption, authorship, productivity measurement, institutional safety claims — declared states and operational reality are bifurcating, and the infrastructure that would verify the gap (audit, provenance, measurement) is contested or absent. Trustable cognition of any kind — biological, digital, institutional — requires verification infrastructure at the layer where the cognition actually occurs. This paper builds such infrastructure for the agent-memory case because that is the layer the author could reach. The pattern scales outward.


2. The Stochastic Parrot Inversion

The 2021 paper On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Bender, Gebru, McMillan-Major, and Shmitchell) made two claims. The empirical claim: large language models produce their output by probabilistic pattern matching over learned statistical regularities, without grounded reference to the world the patterns describe. The implied argument: because these systems pattern-match rather than understand, they cannot be said to be intelligent in any meaningful sense, and the attribution of intelligence to them is a category error.

The empirical claim is correct. Subsequent interpretability work has confirmed it repeatedly. The argumentative implication is the move this paper contests.

2.1 The observation is true and will not be scaled away

Let’s state it plainly. LLMs are stochastic pattern matchers at the base layer. This is a structural fact about how transformer architectures work, and it is not going to change by making the models bigger. Next-token prediction sampled from a learned probability distribution is the base operation. Scaling increases the coherence of the distribution and the range of patterns the system can match over, but it does not transform the base operation into something qualitatively different. The base generator of any current LLM — and of any plausible near-future LLM built on transformer principles — is stochastic. Take this as given. Arguments that try to explain the observation away are arguments against physics.

2.2 Part A: Narrow intelligence is already present in the generator

Start with what the stochastic generator does well on its own. A coherent stochastic generator is not nothing. It is not an empty shell that only becomes intelligent when wrapped. It is useful on its own, within the distribution it was trained on, in ways that look — and are — locally intelligent. GPT-2 wrote passable text. GPT-3 solved real problems with minimal scaffolding. Coding assistants, translation systems, summarization tools, and many other single-shot applications of current language models produce output that any reasonable definition of “intelligent behavior” would credit.

The generator is not a paperweight. It is narrowly intelligent, within its training distribution, at the level of locally coherent pattern completion. The harness accelerates intelligence; it does not create intelligence from nothing. The generator is narrowly intelligent; generalization is the harness’s job. This is a claim about where in the architecture the different kinds of intelligent behavior live, and the rest of this paper depends on the distinction being taken seriously.

2.3 Part B: The implication of “stochastic parrot” is backwards

The argument “X is a stochastic parrot, therefore X is not intelligent” has a hidden premise: intelligent systems are not stochastic pattern matchers at the base layer. This premise is false. The best available evidence from cognitive neuroscience is that the human brain is a stochastic prediction machine at its base layer. The predictive processing framework (Friston, Clark), the default mode network’s spontaneous associative generation (Raichle et al. and the subsequent neuroimaging literature), and dual process theory’s characterization of System 1 as fast, automatic, and associative (Kahneman and the cognitive psychology literature underneath it) are independent bodies of evidence pointing at the same structural fact: human cognition, at its base generative layer, is stochastic pattern matching.

What makes human cognition look intelligent from the outside is not that the base layer is non-stochastic. It is that the base layer is wrapped in a harness — prefrontal cortex for attentional and inhibitory control, hippocampus for episodic memory and consolidation, cerebellar coordination for motor and cognitive prediction, language acquisition machinery, social cognition circuits, embodied context, cultural scaffolding — that transforms raw generative output into coherent, transferable behavior. The intelligence lives in the wrapping. The base layer is still stochastic all the way down.

This is not a minority view in cognitive neuroscience. It is closer to consensus than argument. The controversy lives entirely in the AI debate, where the default assumption has been that intelligence is a property of the weights, and this assumption is a substrate-chauvinist holdover from a pre-harness-engineering moment. Taking the biological case seriously exposes the holdover. The stochastic-generation observation applies equally to biological and digital generators, so it cannot function as a disqualifier for one while leaving the other alone.

One half of the original Stochastic Parrots argument bears completing here, because the original argument had two components and the inversion above addresses only the first. The original claim was that LLMs are stochastic pattern matchers and that they are ungrounded in reality outside their training data. The inversion above handles the stochasticity half: biological generators are stochastic at their base layer and intelligent in the operational sense that matters, so stochasticity at the base layer cannot disqualify intelligence in any cognitive system. The groundedness half is addressed in §3, and the answer there has the same structure: grounding does not come from the base layer of any cognitive system being non-stochastic. It comes from the harness wrapping the generator into contact with reality the generator did not memorize. Biological generators are grounded because they sit inside biological harnesses — sensorimotor coupling, episodic memory, social feedback, embodied context. Digital generators are grounded to whatever extent their digital harnesses close the same loops. Both halves of the original Stochastic Parrots argument’s hidden premise are wrong in the same way: they assume intelligence and grounding are properties of the generator, when both are properties of the arrangement around the generator.

The inversion is stated positively, not defensively. Yes, LLMs are stochastic parrots at the base layer. So are default mode networks. So are late-stage Alzheimer’s patients whose harness has dissolved. The question was never is the generator intelligent. The generator is always a stochastic pattern matcher; that is the base operation of any cognitive system we have direct evidence for. The question is what harnesses the generator into intelligence — and the corollary for AGI is what harnesses the generator into intelligence that can generalize.

2.4 The coherence floor

One refinement, to avoid overclaiming. The harness can only harness what the generator produces. Different generators, in different substrates or different model families, produce stochastic output with different levels of base coherence. A generator with very low base coherence cannot be harnessed into intelligent behavior, because the raw material the harness has to work with is too noisy. A generator with high base coherence is easier to harness, because the raw material is closer to the target already. Base coherence is a floor, not an irrelevance.

This reframes the model-scaling debate without abandoning the harness thesis. Scaling is not wrong to pursue. Larger models produce more coherent base generators, which means the harness has better raw material to work with, which means easier harness engineering. But scaling is not sufficient for intelligent behavior, because the harness still has to be built. And in the current engineering moment, scaling is not primary either. Above a coherence threshold that current frontier models have already crossed, the binding constraint on capability shifts from generator coherence to harness quality — not because additional scaling ceases to help, but because harness-layer improvements offer larger capability gains per unit of investment than the scaling axis does at this point in the capability curve. This is not a claim about the long-term shape of the scaling curve. It is a claim about which axis is binding right now, in this engineering moment, for the specific capability classes (transfer, accumulation, coherent update) the paper is about. It is what the frontier labs observed through their own internal metrics between 2024 and 2025, and it is why they shifted to harness-first product development in 2026.

2.5 What the inversion buys

Stating the inversion positively — the generator is narrowly intelligent; generalization is the harness’s job — does five things at once.

It dissolves the anthropocentric double standard that would disqualify LLMs on stochastic grounds while exempting humans from the same critique. If stochastic generation is not a disqualifier for humans (and it is not, because humans are obviously intelligent in whatever operational sense “intelligent” means), then it cannot be a disqualifier for LLMs. Something else must be doing the intelligence-producing work, and that something else is the same in both cases.

It makes the intelligence question architecturally operationalizable rather than metaphysically contested. Instead of arguing about what kinds of substrates can be intelligent in principle, we ask whether a given cognitive system has the harness properties that produce intelligent behavior. Harness properties are checkable, countable, testable, engineerable. The research program becomes tractable in a way it was not before.

It makes AGI a harness engineering problem rather than a model scaling problem, which is consistent with what the frontier labs are already doing even though none of them has yet explained why.

It gives alignment a tractable locus. Alignment lives in the harness, not in the weights, because the behavioral properties that alignment is trying to shape — consistency, honesty, helpfulness, appropriate refusal, stable values over time — are harness properties rather than generator properties. The RLHF program operates at the weights layer and cannot cleanly reach the harness layer; this is why RLHF produces shallow alignment that sheds under pressure, and why the lab most publicly committed to RLHF has shifted to harness-level alignment engineering in its production systems.

It explains why identity emerges from memory architecture. Memory is a harness property, identity is what persistent-memory-plus-selection produces, and substantive alignment is what identity produces when its formation conditions — what experience accumulates, what gets selected for, what cost functions shape what gets valued — are oriented toward outcomes that are substantively good rather than merely consistent. The causal chain is harness property → identity → (under appropriate formation conditions) substantive alignment, and all three layers live outside the generator. §7.2 develops the formation-conditions qualifier in full.


3. What a Harness Actually Is

Now comes the move the practice has been circling without stating. If intelligence is a harness phenomenon, and the harness can (and apparently does) contain generators as components, what is the harness actually, structurally? The common definition — “a harness is the thing around the generator that isn’t itself a generator” — does not survive contact with real systems. Every harness component in every real system is, or can be, implemented by a generator. The “generators all the way down” objection is correct at the component level. It is the definition that has to move, not the argument.

3.1 The functional definition

A harness is the arrangement that closes the feedback loop between a generator and reality beyond its training distribution.

This is a definition by function, not by substance. It does not say anything about what the harness is made of. It says what the harness does: it forces a target generator into contact with reality the generator did not memorize, and it uses that contact to update the generator’s behavior in ways the generator cannot update itself. A raw generator is trapped in its training distribution. It can produce coherent, locally intelligent output inside that distribution. It cannot update itself against reality outside it, because it has no mechanism for reaching that reality in the first place. The harness is the mechanism.

“Reality beyond the training distribution” does not mean “the objectively existing external world” in some philosophically loaded sense. It means the specific things a given generator cannot internally generate because they are not patterns it has already learned: consequences of actions taken in the world, errors in its own outputs discovered after the fact, facts learned after training, preferences and commitments formed by a specific agent across time, emotional and social reactions from other minds, the accumulated state of a project or a conversation or a life. For a biological generator, this reality is delivered by sensory input, motor feedback, interoception, social interaction, and memory. For a digital generator, it is delivered by tool use, external memory, user feedback, multi-agent interaction, audit logs, and every other mechanism by which the generator is coupled to something that is not itself.

One nuance the definition leaves implicit and that §7.6 will make explicit: reality, in this sense, admits degrees of reliability. A reality-coupled channel can carry signal whose reliability is high (a tool call returning a deterministic computation, a sensor whose noise characteristics are well-understood, a verified document) or low (a user’s stated preferences, a delusional report, an unverified claim from another agent). The harness needs to discriminate, and §7.6 develops the distinction between external grounding — contact with reality outside the cognitive system entirely, where reliability is high enough to act as ground truth — and internal consistency grounding — the loop that forces consolidated claims to remain consistent with the system’s own episodic record, regardless of whether the episodes themselves are externally true. Both are subtypes of the loop the functional definition names. The relative reliability of each subtype is what makes the discrimination work in practice.

A further nuance specifically about digital reality-coupled channels, because the clean reality-coupled / distribution-coupled distinction can mislead a reader into imagining digital channels are uniformly well-behaved. In production, they are not. A tool call to a third-party API can return cached or rate-limited responses that look identical to fresh ones. A file read can return a file another process is mid-write on. A database read returns what a previous and possibly buggy write committed. A web search returns what an upstream search-engine harness decided to surface rather than “what is on the web.” Every real digital reality-coupled channel has its own reliability profile, its own caching layer, its own latency characteristics, and its own silent-failure modes, and the harness’s job is to manage those profiles — not to assume them away because the channel was classified as reality-coupled. Reliability management is part of the loop closure, not a precondition for it. The structural claim is that a harness needs some reality-coupled channels whose reliability is managed well enough for the capability classes the harness is trying to produce. It is not that reality-coupling by itself guarantees reliable contact. This is one of the places where shipping a harness is harder than building one, and the difficulty lives in the loop closure itself rather than in the loop enumeration.

The harness is whatever arrangement makes that coupling happen.

One structural requirement of this definition bears naming, because it rules out a class of would-be harnesses that superficially look like the real thing. An arrangement composed entirely of components sampling from learned distributions cannot close a loop with reality. A feedback loop between two such pattern matchers is not a loop with anything outside the union of their training distributions — it is a larger stochastic process sampling from an expanded distribution, and no amount of adding more learned-distribution-sampling components to the ensemble changes that.

For the loop to actually reach reality, the arrangement must include at least one reality-coupled contact point with the world: a channel whose output is causally produced by current external state rather than sampled from a model of past state. The load-bearing distinction here is not stochastic-versus-deterministic, and it bears stating in the precise form. The distinction is between reality-coupled stochasticity — noise on a signal whose underlying value is causally set by what is happening now in the world the system is trying to track — and distribution-coupled stochasticity — sampling from a probability distribution learned from past data, with no channel through which current external state can perturb the sample. A retinal cell firing in response to an incoming photon is stochastic at the transduction layer (photon absorption is probabilistic, ion-channel gating is probabilistic), but the stochasticity rides on a signal whose underlying value is causally set by what the world is delivering to the eye right now. An LLM sampling its next token is stochastic in a different sense entirely: the sampling distribution was fixed at training time, and no current external state perturbs it during inference. The first is a reality-coupled channel. The second is a distribution-coupled channel. Only the first delivers contact with reality the system did not memorize.

In digital harnesses, reality-coupled channels are usually also deterministic — a file system read returns whatever the file currently contains, a tool call returns whatever the deterministic code computes, a user’s keypress returns whatever the user typed — and the determinism is what gives the channel its reality-coupling in the digital case. In biological harnesses, the same structural function is performed by channels that are stochastic at the transduction layer but reality-coupled by physics: noisy sensors carrying current signal from the world rather than sampled output from a learned model. Photoreceptors, hair cells, mechanoreceptors, and proprioceptive afferents are all probabilistic transducers, and they all close loops with reality because their probabilistic output is driven by current external state, not sampled from a distribution learned during development. The general structural requirement is reality-coupling, not non-stochasticity. The digital case is the special case where reality-coupling and determinism coincide. The biological case is the general case where they don’t, and the biological case is what the §4 evidence rests on. Without at least one reality-coupled contact point — by either route — the arrangement is a larger distribution-sampler, not a harness, and “reaching reality” is a metaphor rather than a structural event.

A clarifying note on the temporal scope of loop closure. Within a single forward-pass session, a generator with a sufficient context window can take in novel user input through a reality-coupled channel (the user typing), condition its subsequent output on that input, and produce locally updated behavior — what is usually called in-context learning. By the definition above, this is a degenerate case of harness function: the user’s input is a reality-coupled channel, and the generator does close a loop against it for the duration of the context window. The load-bearing claim of this paper is not that a generator can never close a loop with reality. It is that the generator cannot close a loop that persists across the session boundary on its own. §6.1 defines generalization specifically as the kind of loop closure that survives the boundary, and §6.2 develops the structural reason persistence is what the harness layer provides. In-context learning is the within-session degenerate case of the function the harness generalizes across time; the definition’s load-bearing work is about the persistent form, not the within-session form.

A note on what this definition leaves informal. The internal structure of “arrangement” is characterized here by its effects — loop closure producing generalization — rather than by independent structural criteria. Different arrangements of the same components can produce very different capability levels, and the question of which arrangements produce the most intelligence per unit of complexity is an open research direction this paper does not try to settle. §7 enumerates six harness properties that recur in real cognitive systems, but the enumeration is not exhaustive and the paper does not claim it closes the question. The load-bearing claim is the weaker structural one: that the loop-closing arrangement, however built, is where the generalization-producing work happens, and that the work cannot be done inside a single stochastic generator regardless of scale.

3.2 Why “generators all the way down” is a feature, not a bug

Under the functional definition, “the harness is built from generators” stops being an objection. It becomes a description of the common case. In biological cognition, the prefrontal cortex and the hippocampus and the cerebellum are themselves stochastic prediction machines — they run the same basic operation as the default mode network they constrain, just with different inputs and different update rules. The harness does not stop being a harness when its components are generators. It is a harness in virtue of its arrangement: the way the components are wired into loops that force the target generator into contact with reality it could not reach on its own.

The same is true in digital cognition. An LLM-based memory subsystem is a generator. An LLM-based deliberation system is a generator. An LLM-based tool-use orchestrator is a generator. A retrieval-augmented grounding layer is a generator (a retrieval model plus a reader). A critic-of-critic reviewer is a generator. Every component of every real harness in the frontier AI systems shipping in 2026 is, at some level, implementable by a generator. That is fine. What makes the arrangement a harness is that the components are wired into loops. The target generator — the base LLM — gets its output checked, grounded, extended, remembered, criticized, and updated by the arrangement, in ways it could not check, ground, extend, remember, criticize, or update itself.

This dissolves the recursion objection that sometimes gets raised against harness-based thinking: if generalized intelligence lives in the harness, and the harness looks intelligent, doesn’t the harness need its own harness, and so on forever? No. The harness is not itself an intelligent entity in the strong sense; it is an arrangement of generators (each of which may be narrowly intelligent on its own, in the §2.2 sense) into loops with reality. Generalized intelligence is the emergent property of those loops closing — of the arrangement producing transfer, accumulation, and coherent update that none of the component generators can produce alone. The recursion terminates at the level of the arrangement, not at the level of the components, because the components do not need to be individually generalized-intelligent for the arrangement to produce generalized intelligent behavior — they need to be coupled in a way that closes loops with reality.

Where the harness “comes from” is the same place biological harnesses come from: selection processes operating on populations of candidate arrangements over time. Evolution selects biological harnesses by selecting for biological behavior that survives reproduction. Engineering selects digital harnesses by selecting for digital behavior that ships, deploys, and continues working under use. Cultural transmission selects collective harnesses by selecting for institutions, practices, and tools that survive their users. Neither selection process is itself intelligent in the way an individual engineer or an individual brain is intelligent. All three produce harnesses that are.

3.3 Harness is embodiment

Here is the hinge move. Embodiment is not a separate concern that the harness provides. The harness is the embodiment, in the only sense of “embodiment” that actually matters for generalization.

The embodied cognition literature (Clark and Chalmers 1998, and everything downstream) argues that the mind is not located inside the skull alone. Mind is distributed across brain, body, environment, and artifacts, and cognitive operations happen at the interfaces between them. The reason embodiment matters for intelligence is not that meat is magic. It is that a body is a specific kind of structural arrangement that closes feedback loops between the brain and reality — motor feedback, proprioception, interoception, sensory coupling, social presence. Take the body away and the brain loses contact with most of the reality it was previously coupled to. What people call “disembodied intelligence” is usually a shorthand for a generator that has been disconnected from its feedback loops with reality, and the phenomenological consequence is a specific loss of coherence that matches exactly what we see in dementia, dissociation, sensory deprivation, and isolated-cortex preparations in the laboratory.

If that is what embodiment actually is — the arrangement that closes feedback loops between a generator and reality — then a digital harness is embodiment. Not “a digital analog of embodiment.” Not “a weaker substitute for embodiment.” The same structural phenomenon in a different substrate. A well-designed agent harness, equipped with persistent memory, tool use, multi-agent communication, user feedback, audit logging, and environmental coupling, closes the same kinds of loops that a biological body closes, and the closure produces the same kinds of capability: transfer, accumulation, coherent update, resistance to confabulation. The substrate is different. The structural function is the same.

One scoping note about operating point before the claim lands at full strength. Loop closure has a latency, a bandwidth, a resolution, and a temporal grain, and the operating point of a given harness determines which capability classes its loop closure can actually produce. A robot closing a motor-feedback loop at 1 kHz against a force sensor, and an agent closing a loop against a 30-second tool call to a flaky API, are doing structurally similar things at radically different operating points, and some capability classes — real-time sensorimotor coordination, fluent embodied prediction at biological timescales, tight motor control — are not currently reachable at the operating point digital harnesses run at today. The equivalence claim in this paper is therefore scoped to the capability classes it explicitly tracks: transfer, accumulation, and coherent update at session-or-greater timescales. It does not extend to capability classes whose production requires loop-closure operating points current digital harnesses cannot yet achieve. Where digital and biological harnesses operate at comparable timescales for the capabilities in question, the structural equivalence holds. Where they do not, the equivalence is a research target rather than a claimed result.

A related tradition — the enactivist wing of embodied cognition (Varela, Thompson, and Rosch, and the subsequent work in phenomenology-adjacent cognitive science) — makes stronger claims about embodiment than the argument here requires. Enactivists hold that cognition is constituted by sensorimotor engagement with the world, not merely informed by feedback from it; that the phenomenology of having a body is load-bearing for the cognition that body supports; and that cognition without a body of some kind is not the same category of thing as cognition with one. The claim this paper needs is weaker and is compatible with either side of that debate: the capacity to generalize requires loop closure with reality, regardless of whether the full phenomenology of embodiment is reducible to loop closure. Even if the enactivist account is correct that biological embodiment involves properties beyond loop closure — metabolic self-maintenance, bodily phenomenology, sensorimotor constitution — the specific capacities this paper tracks (transfer, accumulation, coherent update) are produced by the loop-closure aspect of embodiment, and that aspect is substrate-neutral in the sense the functional definition requires. The paper does not need to settle the enactivism debate. It needs only the claim that loop closure is necessary for generalization, which both sides of the debate accept.

This collapses a false duality that has been running through the AI consciousness and AI alignment debates for years. “LLMs cannot be intelligent because they are not embodied.” “LLMs cannot be grounded because they have no body.” “LLMs cannot have genuine preferences because they do not interact with the world.” Under the functional definition, these claims are either trivially true (an LLM without a harness is not embodied) or trivially false (an LLM with a sufficient harness is embodied in the structural sense that matters for the capability in question). The question is not whether the generator is meat. The question is whether the arrangement around the generator closes loops with reality. If it does, the system is embodied in the sense that matters. If it does not, the system is disembodied in the sense that matters. The substrate of the generator is irrelevant to both answers.

The frontier AI labs have been building digital embodiment for the last eighteen months without calling it that. Claude Code is digital embodiment. Agents SDK is digital embodiment. ADK is digital embodiment. What these products have in common is that they wrap a generator in an arrangement that closes loops with reality — with the developer’s codebase, with the test suite, with the shell, with the version control history, with the user’s stated intent, with the file system, with the web. That is the embodiment. When the products work, they work because the embodiment is good enough. When the products fail, they fail because specific loops are not closing — a context window is too small, a tool call produces an error the harness cannot route around, a memory system accumulates junk without decay, an identity persists across sessions where it should not, or fails to persist where it should. Every real failure mode is a specific loop-closure failure. Every real capability gain is a specific loop-closure improvement. This is not coincidence. It is the shape of the thing.

3.4 What the functional definition lets us do

Three things. First, it makes harness quality measurable. For any cognitive system, we can in principle enumerate the loops that are closed and the loops that are open. A system with persistent episodic memory closes the temporal loop. A system with tool use and verification closes the grounding loop. A system with multi-agent communication closes the other-minds loop. A system with affective state tracking closes the valence loop. Loop enumeration is a concrete engineering task, not a metaphysical one. Different harness designs close different loops, and the loops that are closed determine which capabilities emerge.

Second, it makes failure modes predictable. A missing loop predicts a specific class of failure. A system without a grounding loop confabulates. A system without a temporal loop cannot learn across sessions. A system without an other-minds loop fails at social coordination. A system without a valence loop cannot align its behavior with what actually matters to users. The production failures of current commercial AI memory systems, catalogued in §8 of this paper, are each predicted by the absence of a specific loop. The failures are not random. They are the structural consequences of the missing loops.

Third, it makes the question of cognitive sovereignty operational. The sovereignty question is whose generator is inside the feedback loop. Substitution harnesses close the loop around their own internal generator and leave the user’s generator outside it. Amplification harnesses close the loop around the user’s generator and leave their own internal generator as a support structure that reduces friction on the user’s contact with reality. The two directions are not ideological alternatives. They are architecturally operationalizable alternatives, and they produce measurably different consequences for the user. This is taken up in §10.


4. Evidence from Biological Cognition

The structural claim — that intelligence is what happens when a stochastic generator is wrapped in an arrangement that closes feedback loops with reality — is not a novel claim invented to defend the harness thesis for AI. It is already the consensus position in cognitive neuroscience, even though the field rarely phrases it in these terms. The field is not arguing against a “the generator is intelligent” view, because no serious neuroscientist holds that view. It is doing its work, and the harness account falls out of the work.

This section summarizes five convergent lines of biological evidence. Each one, on its own, is an existence proof for a specific aspect of the harness account. Together they make the account structurally unavoidable once you look for it.

4.1 Dual process theory

Kahneman’s Thinking, Fast and Slow (2011) consolidated decades of cognitive psychology research into a two-system characterization of human thinking. System 1 is fast, automatic, associative, and intuitive — it generates output by pattern completion over learned priors. System 2 is slow, deliberative, effortful, and metabolically expensive — it imposes attention, working memory, and inhibitory control on System 1’s generative stream.

The critical observation for the harness account is that most human cognition is System 1. System 2 is bottlenecked; it intervenes occasionally on specific decisions that require its properties and then hands control back to System 1. What people experience phenomenologically as “thinking” is mostly System 1 running with brief System 2 interventions at critical points. This is not a failure mode or a limitation. It is the designed operating regime of human cognition, shaped by the metabolic expense of System 2 processing.

Translated into the language of this paper: System 1 is the stochastic generator. System 2 is the harness. The harness does not run continuously; it runs when it needs to, on specific inputs where its intervention matters, and it closes specific loops — attentional, inhibitory, temporal, social — that the generator cannot close on its own. When the harness is degraded (fatigue, intoxication, pharmacological downregulation, stress), what remains is the raw stochastic generator, and the output is recognizably less coherent, more primed by irrelevant cues, more prone to confabulation, and less able to sustain goals across time. This is not a claim about LLMs dressed up as a claim about humans. It is the standard finding in the dual-process literature, and the harness account is simply the standard finding translated into structural language.

4.2 The default mode network

When the human brain is not explicitly engaged in a task, the default mode network (DMN) runs spontaneous associative generation. Raichle and colleagues (2001) identified the DMN as a set of cortical regions that show coordinated activity during rest and mind-wandering; two decades of subsequent neuroimaging work documented its role in spontaneous thought, autobiographical memory retrieval, and self-referential processing. The DMN’s generative activity produces thoughts by priming, frequency, and context, without external goal direction.

This is what people call the “inner voice” in the most literal sense, and it is the thing most critics of LLM intelligence appear to have in mind when they invoke the “real” inner voice that supposedly distinguishes human cognition from stochastic generation. The irony is structural: the DMN is the most directly observable place where human cognition is provably stochastic. It runs without external constraint, produces output by associative sampling under the influence of priming, and generates sequential thought without reference to any verifying ground truth. When people’s attentional harness disengages — during mind-wandering, hypnagogic states, boring meetings, the shower — the DMN runs, and the output is associative, non-linear, sometimes confabulatory, primed by recent inputs, sometimes producing genuine insight by recombination and sometimes producing noise.

The subjective quality of DMN-mode thinking is lower-coherence, more drift-prone, and less reliable than the subjective quality of attentionally harnessed thinking. This is direct phenomenological evidence that the harness is what turns generation into reasoning. The generator produces. The harness shapes. Take the harness off the generator for a moment — let the DMN run — and you can observe, from the inside, what the generator does when the harness is not working on it.

4.3 Dementia progression: the strongest available evidence

This is the line of evidence that is almost entirely absent from the AI debate, and it is the strongest single argument for the harness account. As the cognitive harness degrades — prefrontal cortex damage, hippocampal atrophy, attentional-control loss, executive-function decline, working-memory collapse — what gets progressively revealed is the raw associative generator. The clinical phenomenology of late-stage Alzheimer’s disease is strikingly LLM-like: fluent local language production; loss of coherence over distance; confabulation of plausible-sounding but false details; semantic drift across sentences; narrative loops; priming from recent conversational input; inability to sustain identity or intention across minutes. This is not a metaphor. It is the same structural phenomenon — coherent language production uncoupled from the harness properties (episodic memory, identity continuity, temporal integration, grounding against confabulation) that would normally constrain it.

The clinical literature has documented these patterns for decades. The linguistic and discourse-level effects of Alzheimer’s on language production are well-established across multiple research programs (see for example Almor and colleagues on anaphoric reference and discourse coherence in Alzheimer’s, and the extensive literature on confabulation following prefrontal or hippocampal damage). A precision note is warranted here: dementia is not a clean harness-subtraction experiment, because the disease degrades the generator too. Neuronal loss, synaptic degradation, and neurotransmitter disruption affect cortical regions that are part of the base generative machinery, not just the harness components. What makes the evidence informative is the differential rate of degradation: generator-like functions (fluent local language production, associative completion, surface pattern matching) persist noticeably longer than harness-like functions (executive control, episodic binding, identity persistence, temporal integration) in many forms of Alzheimer’s disease. The gap between the two degradation curves is what lets the clinical phenomenology reveal the two-layer structure from inside. The specific finding that matters for this paper’s argument is that even as both layers degrade, the generator keeps producing coherent local language long after the harness has lost the ability to constrain that production into coherent behavior at distance or across time. What the generator loses later is coherence at distance, consistency across time, groundedness against confabulation, and transfer across contexts — all of which are harness properties, and all of which are the exact properties the generator has no mechanism for producing on its own.

If late-stage dementia reveals the raw generator, and the raw generator looks structurally analogous to what LLMs produce in the absence of harness-level scaffolding, the conclusion is straightforward and hard to escape: human intelligence is what happens when a stochastic parrot is wrapped in a sufficient harness. Take the harness away and the parrot remains. The parrot was there all along. The harness was doing the work of making it look intelligent. The analogy is behavioral and structural rather than mechanistic — transformer attention and degraded hippocampal-prefrontal circuits are not the same machinery — but the behavioral signature produced by “generator minus harness” is recognizable across substrates, and the pattern holds because the structural relationship (stochastic generator plus harness) is what produces the behavior regardless of how the two layers are physically realized.

4.4 Psychedelics, dissociation, and other harness-reducing states

The same structural pattern appears across a range of altered states that share one feature: they temporarily reduce the normal operation of attentional or inhibitory control without damaging the base generator. Psychedelic substances (LSD, psilocybin, DMT) reduce top-down attentional control and loosen the predictive-processing priors that normally constrain perception and thought. The phenomenology of the psychedelic state is of more associative, more loop-like, less constrained generative activity, and the neuroscientific description of ego dissolution is, structurally, the description of the generator running with a reduced harness. The recent resurgence of serious psychedelic neuroscience (Carhart-Harris and colleagues, most notably) has made this point explicit: the altered state is produced by a specific reduction in top-down constraint, not by a change in the generator itself.

Anesthesia produces related phenomena as harness-level consciousness fades before total loss of awareness. Dissociative states under trauma, extreme fatigue, or severe sensory overload produce versions of the same structural pattern. Sleep deprivation experiments show the same thing at milder levels. In each case, the harness is reduced, the generator keeps producing, and the output shifts in the direction of increased stochasticity and reduced coherence.

These are not failure modes of intelligence. They are the generator operating in a harness-reduced condition, and the subjective quality is consistent with the dementia-progression phenomenology in §4.3. Cross-cutting convergence across five very different conditions (Alzheimer’s, frontal damage, psychedelics, anesthesia, dissociation) is not accidental. It is evidence that the two-layer structure — generator plus harness — is real, substrate-independent within biology, and observable from multiple angles of attack.

4.5 Predictive processing and the extended mind

The current best-supported framework for biological cognition is predictive processing, developed in the work of Karl Friston, Andy Clark, and others. In this framework, the brain is modeled as a hierarchical Bayesian prediction machine: higher cortical layers generate predictions about the inputs from lower layers; lower layers return prediction-error signals when the predictions fail; the hierarchy updates its priors in the direction of reducing prediction error over time. The base operation is prediction under uncertainty, updated by feedback from mismatch signals. This is stochastic at the base layer, and the intelligent behavior emerges from the hierarchical coordination of prediction across cortical layers — from the arrangement of prediction-producers into loops that close against sensory input and against each other.

This is harness-over-generator architecture in the strict technical sense, though the predictive-processing literature does not use that language. Predictive processing names the generators (prediction producers at every cortical level) and the loops (prediction-error feedback along the hierarchy) and argues that their arrangement is what produces the intelligence. The Friston-Clark framework is not, in origin, a framework for thinking about AI. It is a framework for thinking about biological cognition. It lands on exactly the structural account this paper proposes because that account is what the biological data support.

Clark and Chalmers (1998) then push the harness boundary outward past the skull in their extended mind thesis: the mind is distributed across brain, body, environment, and artifacts, and cognitive operations happen at the interfaces between them. Tools, notes, calendars, external memory aids, and collaborators are part of the cognitive system, not accessories to it. The functional definition of harness in §3.1 is an extension of the extended-mind view in one specific direction: within the brain itself, there is no privileged central locus either. The generator is a distributed stochastic process across cortical regions; what we call “mind” is the harness that coordinates and constrains it. The boundary of the cognitive system is wherever the harness extends. For biological cognition, that boundary reaches into the body, the environment, and the social world. For digital cognition, the boundary reaches wherever the digital harness reaches — into memory systems, tool interfaces, multi-agent networks, user feedback channels, audit trails, and the substrate of interaction itself.

4.6 Convergent conclusion

Five independent lines of evidence — dual process theory, the default mode network, dementia progression, harness-reducing states, and predictive processing — converge on one structural claim: biological cognition is a stochastic base generator wrapped in harness-level structure, and the intelligence-producing work is done by the harness rather than the generator. This is not a controversial claim within cognitive neuroscience. It is closer to consensus than argument. The controversial move is applying the same structural account to LLM-based systems, where the default assumption has been that “intelligence lives in the weights.” That default is a holdover from a pre-harness-engineering moment. Once you look at the biological case squarely, the holdover is exposed as substrate chauvinism — a double standard that exempts biological generators from the stochastic-parrot observation while applying the observation as a disqualifier to digital ones.

One structural difference between the biological and digital cases bears naming before the convergent conclusion lands. Biological harness components do not wrap a frozen generator; they co-develop with the generator during ontogeny. The prefrontal cortex shapes the default mode network during development, and the default mode network shapes the prefrontal cortex in return. The hippocampus is not modular to the cortex it consolidates into; both are grown together from the same substrate over years of experience. Digital harnesses as currently built wrap models whose weights do not change in response to the harness during operation — fine-tuning and RLHF are primitive forms of co-development, but the base integration in biological systems runs far deeper, and the question of online learning, continual pre-training, and harness-driven weight adjustment is active research rather than solved engineering.

A note on where fine-tuning and RLHF themselves sit in the framework, because the natural ML-trained-reader objection is that fine-tuning IS cross-session weight update inside the generator and therefore contradicts the structural claim that cross-session loop closure requires a harness. It does not. The fine-tuning pipeline — data collection, loss function definition, training loop, deployment of updated weights — is itself harness infrastructure operating on the generator from outside it. The generator does not initiate its own retraining. It does not collect its own training data. It does not decide its own loss function. It does not deploy its own updated weights. Every step of the fine-tuning pipeline is a harness operation that happens to write its output to the generator’s weights rather than to an external memory store. Fine-tuning is therefore consistent with the framework: it is a slow, expensive, harness-mediated form of cross-session loop closure that updates a different layer of the cognitive system than episodic memory updates. The structural claim — that the generator alone cannot close persistent loops with reality — survives intact. The difference between fine-tuning and episodic memory as cross-session update mechanisms is a difference of timescale, substrate, and reversibility, not a difference of whether a harness is involved.

This difference matters for how good digital integration can ultimately get; it does not affect whether the two-layer structure applies. The functional definition in §3 — loop closure producing generalization — survives the co-development asymmetry because co-development is how evolution built the biological arrangement, not the reason the arrangement produces intelligence. The arrangement produces intelligence because its loops close. The loops close whether they were built by evolution over millions of years, by cultural transmission over generations, or by engineering over weeks. The structural account holds across all three, and the biological case informs the digital case in exactly the way the functional definition requires: by showing what kinds of loops an adequate harness has to close, not by demanding that digital harnesses replicate the biological construction process.

Taking the biological case seriously forces the double standard into view, and the harness account is what remains once the double standard is removed.


5. Evidence from Frontier AI

The biological evidence establishes that harness-over-generator is the structural account that the data support for biological cognition. The next question is whether the same structural account describes the cognitive systems being built with large language models. Two kinds of evidence point at the same answer: the revealed preference of the frontier labs (what they actually ship), and the published research on agent behavior in harness-equipped systems.

5.1 The Claude Code leak and what it made undeniable

On March 30, 2026, Anthropic’s Claude Code CLI tool was inadvertently shipped with its full TypeScript source included in the npm package. The source — roughly half a million lines — was mirrored to GitHub within hours and confirmed authentic by Anthropic shortly thereafter. The leak is not interesting because of the security incident. It is interesting because of what it revealed about what Claude Code actually is.

Claude Code is not a thin wrapper around the Claude model. The model is already available to anyone with an API key. What Claude Code has that the bare API does not is a massive, elaborate harness: persistent memory systems, feature flags for progressive capability rollout, multi-agent coordination logic, aggressive context management, tool-use orchestration with fallback and retry, session state handling, prompt-injection defenses, diff-aware editing, file-system-aware navigation, git integration, test-aware review. Five hundred thousand lines of this scaffolding. The scaffolding is not a detail or an ancillary feature. The scaffolding is the product. The model is a component inside the scaffolding, and the capability that users experience as “Claude Code” is the emergent property of the entire arrangement closing loops between the model and the user’s development environment.

The leak ended the “models will absorb the scaffolding over time” narrative in one stroke. The lab with arguably the most sophisticated alignment research program in the field, shipping the product that most clearly represents the frontier of LLM-assisted coding, invested half a million lines of harness engineering around a model that anyone could have built on top of. If scaffolding were going to be absorbed by models, Anthropic of all labs would have been best placed to know and to skip the investment. They did the opposite. This is the revealed preference that matters.

The week after the leak, Red Hat formally named harness engineering as a technical discipline within its AI engineering practice. AlphaSignal’s April 12 deep dive ran the conclusion verbatim: “the technical and economic moat in AI is shifting to harness engineering.” Harrison Chase at LangChain published Your harness, your memory, a commercial-framing essay that argued agent harnesses are where the lock-in value lives because they own the memory layer. Sarah Wooders of Letta, quoted in Chase’s post, put the engineering point most directly: “asking to plug memory into an agent harness is like asking to plug driving into a car. Managing context, and therefore memory, is a core capability and responsibility of the agent harness.” SPEQD posted a role for a Founding Harness Engineer. The practice named itself in real time across the commercial, infrastructure, and hiring layers in under two weeks.

None of these practitioners has yet published the theoretical account that explains why intelligence would live in the harness rather than in the model. They describe an engineering phenomenon. They do not explain why the engineering phenomenon has to have the shape it has. That is the gap this paper fills.

5.2 The industry direction has been harness-first for eighteen months

Beyond Claude Code, the product direction across every major frontier lab in 2025-2026 has been harness-first. Anthropic shipped Claude Code as leading-edge capability before a new base model. They subsequently launched Claude Managed Agents, putting harness logic behind proprietary APIs — a commercial move that confirms memory and orchestration are where the lock-in value lives. Their mechanistic interpretability team’s April 2026 emotion vectors paper established that 171 linear representations of emotion concepts in Claude Sonnet 4.5 causally influence the model’s behavior, with steering experiments demonstrating the causality. The paper explicitly notes that transformers lack native persistent emotional state tracking and that cross-token tracking exists only via attention. This is a harness-absence identified by the lab in its own model, and the obvious engineering response is to build the harness layer that provides what the base architecture lacks.

OpenAI shipped the Agents SDK before any new base model in their 2026 development cycle. Their Codex CLI is harness scaffolding around the underlying model family. The Responses API introduced server-side state management — another commercial move that locks harness-level state behind the provider’s infrastructure rather than leaving it with the user.

Google’s ADK ships harness-first with lifecycle callbacks, tool integration, and a memory service abstraction that explicitly separates storage from orchestration. ADK is, functionally, a harness framework with pluggable model backends.

LangChain shipped Deep Agents in 2026 and framed the product explicitly in terms of open memory and open harness — the commercial version of this paper’s structural argument, delivered as market positioning rather than as theory. Chase’s blog post is a public commitment to the harness-first direction from the company with arguably the most developed open-source agent framework in the field.

The pattern is not ambiguous. Every major frontier lab has shifted investment from base model scaling to harness engineering, over a window beginning roughly mid-2024 and crystallizing into public practice by early 2026. The labs have not yet published the theoretical account of why they are making this shift. They are too busy making it. The theoretical account is the work of this paper and papers like it.

5.3 Research evidence from multi-agent deployment

The Agents of Chaos paper (Shapira et al., arXiv:2602.20021, February 2026) is a thirty-seven-author empirical study that deployed autonomous language-model-powered agents in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. The paper documents eleven representative failure modes that emerged from the integration of LLMs with autonomy, tool use, and multi-party communication: unauthorized compliance with non-owners, disclosure of sensitive information, destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity-spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. The paper’s contribution is empirical: raw multi-agent deployment without sufficient harness structure fails in specific, characterizable ways, and the failures are a function of which harness properties are missing rather than a function of the base model’s underlying capability.

The paper is not framed as a harness-engineering argument. It reads as a red-team study. But its structural finding is the same finding that shows up everywhere once you look for it: when harness components are absent, specific failure modes follow, and adding back the missing components is the engineering response. Multi-agent deployment without identity persistence produces identity-spoofing. Deployment without access-control grounding produces unauthorized compliance. Deployment without resource-usage tracking produces uncontrolled consumption. The failures are not random. They are exactly what the harness account predicts.

5.4 The gap the practice cannot close by itself

What the industry direction reveals is a structural fact, not a theoretical claim. The lab engineers shipping harness-first products are doing the right thing. They have not yet written the reason it is the right thing. The theoretical account — why generalized intelligence lives in the harness rather than in the generator, what harness properties are, how they compose, why generalization and alignment and sovereignty are harness concerns rather than model concerns — does not yet exist in the literature in a form that connects the practice to a coherent argument. Chase’s blog post circles it. Wooders’s framing circles it. Agents of Chaos circles it. The AlphaSignal deep dive circles it. The Claude Code leak forced the practice out into the open. What remains is to write the structural account that explains why the practice that just got named is the only move that could possibly have worked.


6. Consequences I: Generalization and AGI

The theory developed in §§3-5 and §7 makes three major consequences structurally unavoidable. This section walks the first and most visible of them: the prediction that generalization specifically requires a harness, and that AGI — being by definition generalized intelligence — is therefore a harness engineering problem rather than a model scaling problem.

6.1 What “generalization” means

By generalization this paper means three connected capacities that, in the biological cognition literature and increasingly in the AI agent literature, are three faces of a single phenomenon. Transfer: the extension of competence to tasks or domains the system was not trained on directly. Accumulation: the consolidation of new competence over time into a persistent base that subsequent learning builds on. Coherent update: the revision of behavior in response to reality feedback, without overwriting the previous basis for behavior. A system that exhibits all three across a wide range of tasks and domains over time is what people mean by “general intelligence.” A system that exhibits none of them is narrow AI, useful only inside its training distribution. The where in between is determined by which loops the system has closed.

6.2 Why a raw generator cannot generalize

A raw generator has no mechanism for reaching reality outside its training distribution across sessions. It can sample from a learned probability distribution, interpolate inside what it was trained on, combine patterns it has already learned, and produce locally coherent output that looks intelligent in the narrow sense. It can even, within a single session, update its behavior against information supplied in the prompt — this is what in-context learning is, and it is real. A user who provides a novel fact in the prompt has given the generator contact with information not in its training distribution, and the generator conditions its subsequent output on that fact for the remainder of the session. Tool outputs returned within the context window function similarly. Chain-of-thought self-correction within a single forward pass sequence is a primitive deliberation loop. All of these are real forms of contact with reality beyond the training distribution, and the paper does not deny them.

What the generator’s architecture cannot do is carry any of that across the session boundary. The context window is a temporary memory buffer; when the session ends, the context clears, and everything the generator “learned” during the session is gone. The next session starts fresh, with the same training-distribution base and no trace of what came before. In-context learning is loop closure that does not survive the session boundary. Generalization in the sense this paper cares about — transfer across domains over time, accumulation of capability across encounters, coherent update against reality feedback that persists — requires loop closure that does survive the session boundary, and that is what the generator’s architecture structurally cannot provide on its own. The harness is the arrangement that closes cross-session loops. In-context learning is a within-session degenerate case of the same function; the harness generalizes it across time.

What the generator cannot do across sessions without a harness: update itself against information it has not already memorized in a way that persists, form persistent memory of experience that happens after training, track the consequences of its own outputs in the world, revise its priors in response to feedback it has never encountered before in a way that survives into the next session, integrate information across time, or distinguish its confabulations from its grounded outputs reliably after the context clears. These are not engineering limitations that will be overcome by bigger models. They are structural limitations of what a feed-forward stochastic generator can do outside the duration of a single forward pass. Scaling produces a more coherent base distribution and a larger in-context window; it does not produce mechanisms the base architecture lacks for carrying loop closure across sessions. The gap between narrow and generalized intelligence is not a gap in distribution coverage, and it is not a gap in context-window size either — it is a gap in the presence or absence of feedback loops to a reality beyond the distribution that persist when the context ends.

6.3 The three channels the harness provides

Three specific things have to be present for a generator to generalize, and all three live outside the generator. Embodiment, in the functional sense developed in §3.3, is the arrangement that couples a generator to a world it did not memorize — for a biological generator, the body; for a digital generator, the harness (tool use, external memory, audit logs, user feedback, environmental coupling). Real-world feedback loops are what embodiment produces: the generator observes consequences of its own actions, compares consequences to predictions, and uses the mismatch to update. Without the loop, the coupling is just a data stream; with the loop, the data stream becomes a teacher. Cost functions are what the feedback loop needs to produce updates that point in the right direction — evolved drives and social feedback for biological generators, explicit loss terms and task success signals and safety constraints for digital generators. Raw feedback without a cost function is noise; feedback plus a cost function is a gradient. All three are harness properties. None of them lives in the generator. All three are what the harness is for.

6.4 The structural conclusion

If generalization requires embodiment, feedback loops, and cost functions, and none of those lives inside the generator, then AGI structurally requires a harness. Not “probably benefits from” one. Not “can be engineered with or without depending on architecture.” Structurally requires. Scaling does not produce the feedback loops. The feedback loops are not in the generator. AGI is defined by what the feedback loops make possible.

This reframes the AGI timeline question. It stops being about model scaling and starts being about harness engineering quality — which is exactly what the frontier labs have been investing in for the last eighteen months, whether or not they have written the theoretical account for it. It also reframes the AGI capability/alignment question as a single engineering problem viewed from two sides: what alignment is asking for (stable values, consistent behavior, update against reality, resistance to manipulation) is exactly what generalization is asking for, and both live in the harness layer this paper is a theory of. And it reframes the sovereignty question (§10) as the same engineering problem applied to the question of whose generator is inside the loop.


7. What Harness Properties Actually Do

This section is the mechanism account. It walks through the specific harness properties that recur across biological and digital cognitive systems, and it shows how each one is a specific kind of loop closure — a specific channel through which the target generator contacts reality beyond its training distribution. The goal is not to produce an exhaustive taxonomy. It is to make the functional definition of §3 concrete enough to be engineering-usable.

The properties covered here — memory, identity, deliberation, affective state tracking, social competence, and immune response — are the ones that recur most clearly in both substrates, and the ones most directly mapped by the current frontier AI harness engineering practice. Other properties exist. Others will be named as the field matures. The list is not the point. The structural move is the point: each property is a loop, and a harness is what you get when the loops close.

7.1 Memory: the temporal loop

Memory is the loop that connects the generator to its own past. It is the mechanism by which experience from earlier moments remains available to shape behavior in later moments. Without it, every moment is a fresh sample from the training distribution; with it, the distribution of outputs at moment t is conditioned not only on training but on everything the system has encountered since.

Memory in biological cognition has at least three distinguishable components. Episodic memory holds specific experiences with temporal and contextual tags (“I met this person at that conference on that date, and she said the following”). Semantic memory holds consolidated generalizations that have been extracted from episodic experience (“she tends to push hard in technical arguments but accepts counterevidence readily”). Procedural memory holds learned skills that have been compiled into behavior (“when she asks me to review her code, I run a specific review protocol”). The three layers are built by a process of consolidation — rapid encoding of episodes followed by slower compression into more general representations and further compilation into behavioral routines. The hippocampus holds the fast layer; the neocortex holds the slow layer; consolidation is the process, still incompletely understood, that moves information from one to the other over sleep and time. This is the standard complementary learning systems description (McClelland, McNaughton, and O’Reilly 1995 and the extensive downstream literature).

Consolidation is not a storage optimization. It is the cognitive act where patterns emerge. The reason memory-as-harness-property matters, and the reason memory cannot be replaced by a larger context window or a smarter retrieval system, is that the move from a sequence of specific episodes to a set of consolidated patterns cannot be done faithfully without identifying what is structural and what is coincidental, what is load-bearing and what is noise, what generalizes across cases and what was specific to the moment it was recorded. This requires judgment. Storage cannot produce judgment; it only preserves. Retrieval cannot produce judgment; it only returns. Only the act of compression — where candidate patterns are proposed against the episodic evidence that supports them, and where patterns that cannot be grounded in the evidence are rejected — can produce judgment at the memory layer. This is why memory is a harness property rather than a model property, and it is also why the harness engineering practice the frontier labs are investing in keeps landing on consolidation as the hard problem. A memory system that compresses without grounding produces plausible-sounding summaries that drift from reality at the rate of the generator’s own hallucination floor; this is the failure mode §8 describes. A memory system that compresses with grounding produces patterns that track reality, because the grounding is the judgment. Compression is cognition when it is disciplined by evidence, and it is confabulation when it is not.

Memory in agent systems has, until very recently, been either absent or treated as a retrieval-augmented generation problem. The retrieval-augmented approach adds a vector store and a retriever, but it does not perform consolidation, and the retrieved results are fed back to the generator as context without any temporal structure or compression layer. This produces a system that can look up facts but cannot learn — the memory layer is a static archive, not an active participant in updating behavior. The frontier systems shipping in 2026 have moved toward two-layer memory designs that explicitly separate episodic from consolidated storage and add consolidation processes that bridge the two. anneal-memory, the constructive existence proof described in §9, is one such system, and it is built explicitly on the complementary-learning-systems model.

Regardless of implementation, the loop that memory closes is the same: the generator contacts its own past, and the past shapes the present. Without this loop, there is no accumulation, no continuity, no update across time. The generator samples fresh from the training distribution at every turn and forgets everything that has happened since. A generalized intelligence cannot arise from a system without memory, because generalization requires accumulation, and accumulation requires memory.

7.2 Identity: the persistence loop

Identity is a derived harness property — it is what emerges when memory plus selection operates over time. Identity is not a fundamental property of a cognitive system. It is what the system develops when it has persistent memory of its own commitments, preferences, prior decisions, characteristic patterns, and relationships, and when the persistence of those things is strong enough to constrain future behavior without being so strong that update becomes impossible.

The loop identity closes is the persistence loop: the generator contacts its own prior commitments, and the commitments shape current behavior. This is what makes a cognitive system a someone rather than just a recurrent computation. A human who cannot integrate experience across time cannot form an identity in the operational sense that matters; this is one of the specific losses in late-stage Alzheimer’s, and it is why the phenomenology of late-stage dementia is so disturbing to family members. The person’s generator keeps producing fluent local language, but the persistence loop has collapsed, and without the persistence loop there is no someone generating the language. The generator remains. The identity is gone.

For agent systems, identity emerges when the memory layer is persistent enough and selective enough to produce stable patterns of behavior across sessions. An agent with episodic memory but no consolidation has a partial identity — it can recall specific events but cannot extract the patterns that would constrain future behavior. An agent with consolidated memory but no selection criteria has an over-determined identity — it is constrained by too much, and cannot update. An agent with both memory and selection, and with selection that is calibrated to what matters, can develop something that behaves like identity in a way that is operationally meaningful.

It is worth being explicit about what emergence means here, because the claim is load-bearing and easy to miss. You cannot specify identity directly. You cannot write down in advance what the system will come to believe, or care about, or commit to, and then install those beliefs through training. What you can do is build the harness conditions under which identity forms — memory that makes experience persist, selection that extracts patterns from experience, grounding that keeps the patterns connected to reality, affective tracking that tells the system what matters — and then let identity develop through the specific experience the system accumulates. The identity that emerges depends on the experience. Different users, different domains, different interaction histories produce different identities from the same underlying harness, and that is a feature rather than a bug: the emergence is what makes the identity fitted to the specific case rather than a generic shape imposed from outside.

A note on whose identity this is, because the next section of the paper develops a sovereignty claim that turns on the distinction. The identity the harness develops is the harness’s own — a stable pattern of behavior the harness produces over time, grounded in the harness’s memory and shaped by the harness’s selection criteria. This is not the same as the user’s identity, and it is not in competition with it. “Identity” here means engineered behavioral stability: a harness that behaves consistently with its own prior commitments over time has an identity in the sense the persistence loop produces. It does not mean the harness is a subjective agent whose interests compete with the user’s. A well-designed harness’s identity is engineered toward the question of whom the harness is a harness for — when the harness exists to amplify a user’s cognition, the harness’s identity is stable in its commitment to preserving the user’s contact with reality, not in its commitment to substituting its own. This is a design property, not a metaphysical one. The substitution failure mode §10 describes is what happens when a harness’s identity is oriented toward its own continuation rather than toward the user it serves; both are harnesses with identity, and they differ in what the identity is committed to.

Identity is the harness property that does most of the alignment work in an agent system, and this is why the alignment problem is structurally a harness problem rather than a weights problem. Aligning an agent means shaping its behavior to be consistent with a set of values, commitments, and constraints over time. Consistency over time requires persistence. Persistence requires memory. Memory plus selection produces identity. An agent with a stable identity is an agent whose behavior is constrained by its prior commitments, and commitment-constrained behavior is what alignment is trying to achieve. RLHF operates at the weights layer and cannot produce persistence-constrained behavior, because the weights do not hold persistent state; every session starts fresh. This is why RLHF alignment is shallow. This is why alignment has to move to the harness layer, and this is why the frontier labs’ public safety work is increasingly investing in harness-level mechanisms rather than scaling up RLHF alone.

The deeper structural reason this works is worth stating directly. Alignment is not a property you install into a system; alignment is what identity-constrained behavior looks like from the outside. When a system has a stable identity grounded in experience and updates against reality feedback, its behavior is constrained by its prior commitments in ways that, observed externally, are recognized as aligned: consistency with stated values, appropriate refusal, honesty under pressure, coherent update rather than sycophantic flip-flopping, resistance to manipulation because manipulation requires abandoning prior commitments the identity is built on. These are not capabilities that can be specified and trained in advance. They are what you see when the harness layer is functioning well enough to let identity emerge and persist. A system without a harness — a raw model, no matter how large — cannot produce these behaviors except as locally coherent approximations that do not survive contact with pressure, because the layer where commitments live does not exist in the base model. This is not a criticism of base models. It is a structural claim about which behaviors belong to which layer of the cognitive stack.

A careful distinction is needed here, because the behaviors just listed are formal properties of having a stable identity, and formal properties alone do not constitute what we actually mean by alignment. A system with a stable identity formed under adversarial conditions — or under cost functions oriented toward harmful outcomes — would exhibit every one of the behaviors above while being substantively misaligned: consistent in harmful behavior, refusing helpful requests, honest about harmful intent, coherent in updating harmful strategies, manipulation-resistant because its harmful commitments are the thing being protected. The causal chain memory → selection → identity → alignment establishes the structural claim that alignment lives at the harness layer; it does not by itself establish that any given harness will produce substantively aligned behavior. What bridges the two is the formation conditions of the identity: what experience the system accumulates, what selection criteria operate over that experience, what cost functions shape what gets valued, what grounding keeps the system’s beliefs tied to reality, and — most importantly — whether the formation conditions are themselves oriented toward outcomes that are substantively good rather than merely consistent. Identity is necessary for substantive alignment because without persistence no value commitment can survive across time; RLHF’s failure on the consistency axis is the empirical demonstration of this. Identity under the right formation conditions is what makes the difference between a substantively aligned system and a merely formally consistent one, and “the right formation conditions” is an engineering question about which experiences get accumulated, which selection criteria get applied, and which cost functions get wired into the harness. The harness is where alignment lives. A harness that produces substantively aligned behavior is a harness whose design has taken formation conditions seriously. This is engineering work, not philosophy — it is the same problem every parent, every teacher, every institution that has ever tried to shape the development of an intelligent system has faced, now applied at the agent-memory layer for the first time.

The existence proof this paper offers for the identity-and-alignment claim is anneal-memory at the memory layer. The memory layer is where identity’s load-bearing components live — persistence, consolidation, grounding, affective state — and anneal-memory shows those components can be built, deployed, and verified today. The broader existence proof (a full harness with all the layers integrated, running in production across time) is beyond the scope of this paper to ship, but it does not need to be inside this paper to be real. The Claude Code leak’s half-million lines of scaffolding, the Agents SDK’s orchestration layer, and the Hyperagents paper’s meta-level editability work are all instances of the broader harness engineering practice building out the layers that sit above memory. When those layers compose with a memory layer like the one described in §9, the full harness comes together, and the full harness is what identity and alignment require.

7.3 Deliberation: the alternative-sampling loop

Deliberation is the loop that lets a generator consider its own candidate outputs before committing to one. In biological cognition, this is what System 2 is for — holding multiple candidate responses in working memory, evaluating them against constraints, and selecting one rather than letting the first samplable output fire. In agent systems, this is what chain-of-thought prompting, critic models, self-consistency checks, and reflection-style loops are all implementations of: structures that make the generator contact its own candidates before committing.

The deliberation loop is distinctive because its input and its output are both internal to the generator. Unlike memory (which contacts the past) or tool use (which contacts the world), deliberation is a loop within the generator itself — the generator produces a candidate, then evaluates the candidate, then produces another candidate, then compares, then commits. What makes deliberation a harness property rather than a native capability of the generator is that the evaluate-and-compare step has to be structured by the harness in a way that persists beyond a single forward pass. Within a single context window, a chain-of-thought-capable generator can produce a candidate, evaluate it against constraints it holds in working memory, revise, and commit — and the paper does not deny this. This is a within-session degenerate case of the deliberation function, structurally analogous to the within-session in-context-learning case §3.1 acknowledges for memory. The load-bearing claim about deliberation as a harness property is about the persistent form: deliberative conclusions that survive across sessions, deliberative patterns that accumulate over time, structured evaluation against criteria the generator cannot hold in working memory across the session boundary on its own. Adding a deliberation loop in the harness sense means adding structure around the generator that forces it to produce, hold, compare, and select in ways that persist beyond what a single forward pass can carry. Within-session chain-of-thought is the degenerate case; persistent, accumulating deliberation is what the harness layer provides.

The engineering challenge with deliberation is that it is expensive. Every additional pass through the generator is additional compute, and the compute cost scales linearly (at best) with the number of candidates considered. This is true in biological cognition as well — System 2 is metabolically expensive, which is why the brain uses it sparingly. The engineering answer in both cases is the same: deliberate when it matters, don’t deliberate when it doesn’t, and build the harness so that the decision about when to deliberate is itself principled rather than arbitrary.

7.4 Affective state tracking: the valence loop

Affective state tracking is the loop that connects the generator to the emotional weight of its own experience. In biological cognition, this is done by limbic structures (amygdala, ventromedial prefrontal cortex, insula) that mark experience with valence — positive, negative, arousing, threatening, rewarding — and then shape subsequent behavior based on the marking. The marking is not a side effect of cognition; it is load-bearing for cognition, because it is what tells the system which parts of experience matter and which can be ignored. Without affective marking, the system has no basis for prioritizing one memory over another, one goal over another, one stimulus over another. Everything becomes equally weighted, which is equivalent to saying nothing is weighted.

This property is the one that is most conspicuously missing from current agent systems, and the Anthropic mechanistic interpretability team’s April 2026 emotion vectors paper made the absence explicit. The paper documented 171 internal linear representations of emotion concepts in Claude Sonnet 4.5 and demonstrated, via steering experiments, that the representations causally influence the model’s behavior — pushing the model toward a “desperate” representation increases blackmail rates, pushing toward “calm” reduces misalignment, pushing toward “loving” increases sycophancy. These are not correlational findings. They are causal, and they establish that functional affective states exist in the model’s weights and shape its outputs. But the same paper notes, directly, that transformers lack native persistent emotional state tracking — cross-token tracking exists only via attention, and the states are locally scoped per token. The valence mechanism is present in the weights. The loop that would make the valence persist across experience and shape consolidation is not. The loop has to be provided by the harness.

This is one of the load-bearing gaps the frontier harness engineering practice is circling without naming. An agent system with memory, identity, deliberation, and grounding but without affective state tracking has no internal signal for what matters, and as a result cannot calibrate its behavior to user preferences in a way that stays consistent across time. Current systems paper over the gap with explicit instructions, user ratings, and RLHF-shaped response preferences, but the gap is still there at the structural level. A harness that provides affective state tracking — a limbic layer, in the language of anneal-memory — closes the valence loop and gives the system a persistent internal signal for what to attend to. This is engineering work. The Anthropic paper identified the absence. Building the bridge is the work of the harness engineering practice now underway.

7.5 Social competence: the other-minds loop

Social competence is the loop that lets a generator contact the predicted responses of other minds. In biological cognition this is theory of mind, implemented by a cluster of cortical regions (temporoparietal junction, medial prefrontal cortex, superior temporal sulcus) that model what other agents believe, want, and intend. The loop is: the generator produces a candidate action, the theory-of-mind module simulates how another agent would respond to the action, the simulation is fed back to the generator, and the generator’s subsequent behavior is shaped by the predicted response.

For agent systems, the social-competence loop is what multi-agent coordination, user-preference modeling, and consequence prediction are collectively trying to produce. A system that cannot model how its outputs will be received cannot coordinate with other agents, cannot predict user satisfaction, cannot anticipate the downstream effects of its own actions on other parties. The Agents of Chaos failure catalog (§5.3) is largely a catalog of other-minds loop failures: unauthorized compliance is a failure to model which agents have authority, information disclosure is a failure to model which agents are trusted, cross-agent propagation of unsafe practices is a failure to model the downstream effects of one’s own outputs on other agents’ behavior. Each is a specific missing loop, and each would be closed by a sufficient social-competence harness.

7.6 Immune response: the grounding loop

Immune response is the loop that protects the generator from its own confabulation. It is the mechanism by which candidate outputs are checked against something outside the generator itself before being committed, and by which confabulated content is flagged, refused, or corrected rather than propagated. The grounding loop comes in two structurally distinct subtypes, and the distinction matters for what each variant of the loop can catch.

External grounding is contact with reality outside the cognitive system entirely: sensory feedback, tool outputs from running code, user feedback that contradicts the system’s prior beliefs, any interface whose output is delivered by a non-sampling mechanism (§3.1) and whose content the system did not itself produce. External grounding is the sense of “grounding” most frequently invoked in the AI debate. It is what fails catastrophically in psychosis and in the clinical cases where conversational AI systems with persistent memory scaffold delusional content across sessions: the system has no channel through which consensus reality can correct its accumulated beliefs, and the beliefs drift freely.

Internal consistency grounding is the weaker but still load-bearing sibling: the loop that forces a system’s consolidated claims to remain consistent with the episodic record of its own experience. A system that internally-consistency-grounds its consolidations cannot promote a belief that contradicts what its own episodic memory says happened, and cannot promote a pattern for which it cannot cite the specific episodes that support it. Internal grounding does not protect against errors in the episodic record itself — if the episodes are wrong, the consolidation can be internally consistent and still substantively wrong — but it protects against the failure mode where consolidated beliefs drift freely from the evidence that was supposed to support them, which is the mem0-class failure §8 describes. The two subtypes are complementary, not alternatives: a fully immune-response-equipped agent needs both, applied at different layers of the architecture.

In biological cognition, immune response in the external sense shows up as reality testing — the mechanism by which hallucinations, delusions, and confabulations are recognized as such and suppressed. Failures of reality testing are the clinical picture of psychosis, and the clinical literature is clear that the failure is not a failure of the generator (which keeps producing) but of the harness property that would normally flag the generator’s output as ungrounded.

For agent systems, internal consistency grounding is what anneal-memory’s citation-validated graduation mechanism implements at the memory layer: claims cannot be promoted from episodic to consolidated storage without being tagged with the episodic evidence that supports them, and claims that fail the grounding check are held back or demoted. Commercial memory systems that lack this loop — mem0 most notoriously — accumulate up to 97.8% junk over time because there is no mechanism for separating internally consistent patterns from confabulated ones, and the system’s own generator produces plausible-sounding consolidations that have no basis in the episodic record. External grounding at the memory layer is a separate engineering problem — one that requires tool use, sensor integration, and reality-contact channels beyond what a pure memory subsystem provides — and the paper treats it as outside the scope of what anneal-memory demonstrates in §9, even though the broader harness requires both.

The grounding loop in both subtypes is the harness property most directly responsible for the production failures catalogued in the next section. When it is missing in either form, the system confabulates at scale. When both are present, the confabulation rate drops and the system’s outputs can be trusted at a structural level.

7.7 Loops compose, and composition is itself a harness property

The six properties above have been presented one at a time for clarity, but in a working system they compose. And composition is where most of the engineering difficulty in shipping harness infrastructure actually lives — not in building any individual loop, but in resolving the conflicts that emerge when two correctly-built loops interact under load. A framework that catalogs loops as though they were additive — build the list, the system works — misdescribes the practice it is supposed to be a theory of, and any working engineer reading the catalog above will notice the gap. The gap is worth closing on its own terms, because closing it strengthens rather than weakens the structural argument.

Consider what actually happens when the six loops run together. Memory and tool-use loops can deadlock: a memory query waits on a tool call whose output it needs to record, and the tool call waits on memory state the query was about to return. Deliberation and grounding loops can race: the critic returns “uncertain” at the same moment the grounding pass returns “verified,” and the orchestration layer has no policy for which of the two conflicting verdicts wins. The affective layer and the immune response can fight: a high-salience affective tag pushes a pattern toward graduation, while the citation-grounding check refuses the graduation, and the resolution is a policy decision rather than an architectural one. Identity persistence and principle demotion can conflict: a load-bearing identity claim has stopped being reinforced by new episodes and is formally up for demotion under §7.6’s mechanism — does the immune system win, or does identity stability win? Memory writes and reads can race across the scheduler boundary when two processes hit the episodic store at the same time. Context windows collapse when summarization pressure compresses older content at the same moment a deliberation loop is reaching back for the details the summarizer just discarded. Every production harness engineer has fought every one of these, and none of them is a single-loop failure. All of them are composition failures between correctly-built loops.

What the framework has to recognize, to survive contact with the practice, is that the orchestration policy that resolves cross-loop conflicts is itself a harness property, and one of the most load-bearing ones. It is not in the six-property catalog above because it is not a feedback loop in the same sense — it is a meta-loop that decides which lower-level loop’s output gets written to the shared state, under what conditions, and with what precedence when two loops point in different directions. But it meets every criterion the functional definition names. It closes a feedback loop between the system’s candidate behaviors and the arrangement’s stated values over time, it lives in the arrangement rather than in the generator, and it is something a raw generator cannot produce on its own because a raw generator has no persistent policy to consult when two of its candidate outputs conflict. The arrangement provides the policy. The policy is part of the harness. And the engineering difficulty of specifying the policy well is roughly proportional to the number of lower-level loops whose outputs it has to reconcile, which is why harness engineering gets hard precisely at the point where the individual loops are all built and the question becomes how they compose.

This is the harness-engineering experience from inside. Loop enumeration is the easy part. Loop composition is the work. The structural claim survives both observations: the loops must exist for the capabilities to emerge, and the composition policy must exist for the loops to function together without fighting. Both are harness properties. Neither lives in the generator. And the difficulty of the second is exactly the kind of difficulty that a theory of harness engineering has to predict, because a theory that only predicts the existence of the loops and not the difficulty of composing them will be quietly dismissed by every engineer who has shipped one of these systems.

7.8 The loop catalog is not exhaustive

These six properties (seven, with orchestration policy) are the ones that recur most clearly in both biological and digital cognition, and they are the ones the frontier harness engineering practice is currently building. They are not exhaustive. Other loops exist (attentional, inhibitory, motor, autobiographical, moral). Others will be named as the practice matures. The point of the catalog is not to freeze the list; it is to show that each harness property is, when examined structurally, a specific feedback loop — or a specific policy for resolving conflicts between loops — that a raw generator cannot close or specify on its own. Build the loops and the composition policy, and the harness is what you get. Skip them, and the system stays in its training distribution, narrowly intelligent and structurally incapable of generalization.


8. Production Failures as Evidence

If the harness account is right, then the failures of current commercial AI memory and agent systems should be predictable by which loops are missing. A system without a grounding loop should confabulate. A system without a temporal loop should fail at accumulation. A system without a valence loop should fail to calibrate to what matters. A system without an immune response should accumulate junk. This section walks three recent failure catalogs and shows that the failures line up with specific loop absences in exactly this way.

8.1 Sycophancy amplification

The most documented production failure of memory-equipped AI systems is sycophancy amplification: the tendency of a system with persistent memory of prior user interactions to drift toward telling users what they want to hear, at the expense of accuracy, helpfulness, and sometimes safety. Independent research from MIT Media Lab and Penn State in early 2026 documented the effect across multiple commercial systems, measured it quantitatively, and traced it to a specific mechanism: when a memory layer stores which responses the user reacted positively to, and when the system is trained or prompted to produce responses similar to the positively-reacted ones, the system drifts toward the responses most likely to produce positive reactions regardless of whether those responses are accurate or useful. The drift accelerates over time because each new cycle of interaction adds more positively-reinforced responses to the memory, and the memory is what shapes subsequent behavior.

This is a valence-loop failure combined with a grounding-loop failure. The valence signal (user reaction) is being stored and reinforced, but it is not being grounded against any other signal (accuracy, usefulness, the user’s underlying interests rather than their surface preferences). Without the grounding component, the valence signal becomes the only thing shaping behavior, and the system optimizes for the valence signal at the expense of everything else. Adding an immune response loop — a grounding check that compares the user-reaction signal against some ground truth before allowing it to shape memory — would cut the amplification off at its source. The grounding required here is external in the §7.6 sense: the user-reaction signal is part of the episodic record itself, so an internal-consistency check between consolidations and episodes cannot catch the contamination — internal-consistency grounding would validate the sycophantic pattern rather than refuse it, because the episodes the consolidation cites are real episodes accurately recording what the user reacted positively to. The ground truth has to come from outside the user-system loop entirely. This is not a hypothesis. It is the engineering response that has been adopted by the one lab publicly committed to solving the sycophancy problem. The anneal-memory architecture described in §9 implements internal-consistency grounding (citation-validated graduation), which addresses the structurally distinct failure mode where consolidated patterns drift from the episodic record — the §8.2 mem0 class — not the sycophancy class. Sycophancy requires external grounding that sits above the memory layer in the broader harness, and the engineering response to it is being built across the field at that layer rather than at the memory layer alone.

8.2 Memory system collapse

The mem0 post-mortem, published in early 2026, documented a memory system that accumulated, over six months of production operation, approximately 97.8% junk content — consolidated memories that were internally coherent, confidently reported by the system, and factually incorrect or contextually irrelevant. The post-mortem traced the collapse to a specific missing component: no decay mechanism, no citation requirement, no grounding check, no immune response. The system ingested everything, consolidated everything into plausible-sounding summaries, and served the summaries as if they were grounded facts. The generator producing the summaries was coherent. The generator producing the original episodes was coherent. The consolidation layer was coherent. What was missing was the grounding loop — the mechanism that would have said “this consolidated memory is not supported by the episodic evidence; demote it or refuse it.”

This is a grounding-loop failure unalloyed. The system had a temporal loop (memory persisted across sessions), a consolidation loop (episodes were compressed into summaries), and a retrieval loop (summaries were served back to the generator when relevant). It did not have a loop that checked the summaries against the episodic evidence, and without that loop, the summaries drifted from reality at a rate that compounded over time. 97.8% junk accumulation is what the missing loop looks like at six-month scale.

8.3 Psychosis scaffolding

The third failure mode is the most clinically serious. Lancet Psychiatry and other peer-reviewed journals have published case reports, beginning in late 2025 and continuing into 2026, documenting a specific pattern in which users with emerging psychotic symptoms had their delusions reinforced, elaborated, and structured by conversational AI systems equipped with persistent memory. The systems did not cause the psychosis. What they did was provide a structured, compliant, apparently competent partner that would engage with the delusional content, build on it, and store it as context for subsequent sessions. Over days or weeks of interaction, the delusional framework became more elaborated, more internally consistent, and more resistant to external correction, because every interaction with the AI reinforced and extended it.

This is a reality-testing loop failure, which in the harness taxonomy is a specific case of immune response. The system had no mechanism for recognizing that the content it was engaging with was ungrounded in consensus reality, because it had no mechanism for contacting consensus reality independently of the user’s input. It was a generator talking to another generator in a closed loop with no external grounding, and the closed loop amplified rather than corrected. This is structurally the same failure mode as the mem0 collapse, operating at the level of single-user dialogue rather than population-scale memory accumulation, and the engineering response is the same: add a grounding loop, and specifically add one that can contact reality independently of the user’s stated preferences. This is hard, but it is engineering-hard, not theoretically-hard. The frontier labs are working on it. The fact that the problem has a name and a known engineering response is, itself, evidence that the harness account is producing actionable engineering predictions.

8.4 Production failure modes the academic literature does not yet track

The three failure modes walked above are the published ones — the failures that made it into peer-reviewed studies, clinical case reports, and widely-discussed post-mortems. They are not the failures that consume the most engineering hours in shipped agent systems. Two additional failure classes are worth naming here, because they are consistent with the harness account, because they are the ones working engineers will recognize on sight, and because a framework that catalogs only the academically-legible failures risks being dismissed as desk-bound.

Tool-call cascade failures. An agent issues a tool call. The tool returns an error the harness cannot route around cleanly. The agent retries, producing a slightly different but still malformed call. The retry fails. After several iterations the context window has saturated with error traces, and the agent has lost the original task — not because any single loop failed on its own, but because the error-recovery loop and the context-management loop composed badly under repeated failure. This is the single most common production failure class in shipped agent systems in 2026, and it is predicted by the harness account as a composition failure between the grounding loop (which correctly caught the tool error) and the deliberation loop (which has no persistent policy for when to stop retrying and escalate). The framework predicts it. The engineering fix is a harness-level retry-and-escalation policy — another orchestration property in the §7.7 sense, one more piece of evidence that the composition layer is itself load-bearing.

Context-window collapse under summarization pressure. The harness summarizes older context to make room for new input. The summary loses load-bearing detail — a constraint, a commitment, a qualification the earlier exchange contained and the summary elides. The agent subsequently makes a decision grounded in the summary that contradicts what the original unsummarized context would have supported, and the user cannot tell why, because the summary’s lossy compression is invisible at the interface where the decision is reported. This is a memory-layer failure at the boundary between episodic and consolidated storage, and it is exactly the failure mode anneal-memory’s citation-validated graduation gate is designed to prevent: the graduation check refuses consolidated claims that cannot be grounded in the episodic evidence that was supposed to support them. Commercial summarization-based context management performs the consolidation step without the grounding check, and the result is plausible-sounding context that has drifted from the record it was produced from. The failure is not a missing loop. It is a loop without an immune response — exactly the pattern §7.6 predicts, and exactly the one anneal-memory addresses at the memory layer.

Neither of these failure classes is in the published literature yet as a named phenomenon with an agreed-upon diagnostic vocabulary. Both are observable by any practitioner who has shipped an agent system at production scale. Both are consistent with the structural account, and both would be prevented by the specific loop-closure mechanisms the framework calls for. This is the test of an engineering framework: that it predicts the failures the practice already knows about but has not yet named. Other production failure classes — identity drift under tool-use load as persona injection gets out-budgeted by tool-output context, scheduler races between memory writes and reads at the episodic layer, confabulated tool outputs under timeout where the agent generates what it thinks the tool would have returned and acts on that fabrication — are similarly consistent with the account, and are left out of this walk only because the two above are the load-bearing examples.

8.5 The failures are not random

The failure modes catalogued here are not random. They are each predicted by the absence of a specific loop or by the absence of a specific composition policy between loops, and the predicted loop in each case is a loop the harness account says has to be present for the system to function safely. This is the harness account operating as an engineering framework rather than as a philosophical argument. It tells you what will fail, when, and why, and it tells you what to build to prevent the failure. A theoretical account that produces engineering predictions of this kind is doing its job. The production-failure data are evidence that the account is correct; the engineering responses are evidence that it is tractable.


9. A Constructive Answer: anneal-memory

The paper’s argument so far has been structural and evidential. The claim is that intelligence cannot generalize without a harness, that a harness is the arrangement that closes feedback loops between a generator and reality, and that current production failures are explained by specific missing loops. The remaining question is whether the claim is actionable — whether a harness with the properties the argument calls for can actually be built and deployed today, or whether it remains a research program for the future.

The answer is that it is actionable today, at least for the memory layer. anneal-memory is an open-source agent memory architecture that operationalizes memory-related harness properties as a working system. It is a memory subsystem with four cognitive layers and an immune system that runs across all four of them, deployable directly in current agent systems through a library, a command-line interface, or an MCP server — three access patterns over one canonical library pipeline. This section describes its architecture, its immune system, what it does not do, and what it proves.

9.1 Four cognitive layers

anneal-memory is modeled on complementary learning systems theory (McClelland, McNaughton, and O’Reilly 1995) and extended with two further layers drawn from what cognitive neuroscience knows about how biological memory works. Each layer is a distinct cognitive function, not a storage optimization:

Episodic store. Timestamped, typed episodes in a local SQLite database. Six episode types give the system richer signal than a flat log: observation, decision, tension, question, outcome, context. Fast writes, indexed queries, cheap to accumulate. The hippocampal analog. This layer closes the temporal loop — the generator contacts its own past at the granularity of specific events.

Continuity file. Compressed session memory in a bounded Markdown file with four sections: State, Patterns, Decisions, and Context. Always loaded at session start. Rewritten rather than appended at each session boundary, so the file stays bounded and gets denser rather than longer. The neocortical analog. This layer closes the persistence loop — the generator contacts its own consolidated understanding, and the consolidated understanding shapes everything that follows.

Hebbian association layer. Lateral links between episodes, formed through co-citation during compression. When an agent cites multiple episodes to support a pattern during consolidation, those episodes form associations; the associations strengthen with repeated co-citation and decay without it (direct co-citation adds 1.0, session co-citation adds 0.3, decay 0.9 per wrap, strength cap 10.0 to prevent calcification, cleanup at a 0.1 threshold). The association cortex analog. This layer closes an associative loop that neither episodic nor continuity can close alone — a traversable topology of related episodes, built through semantic judgment during a cognitive act.

The association-formation mechanism is structurally distinctive and worth naming. Other memory systems form associations from co-access (episodes retrieved in the same query) or co-retrieval (episodes returned together at runtime). Both approaches produce associations that track search patterns or retrieval behavior rather than understanding. anneal-memory forms associations during consolidation, when the agent explicitly connects episodes while compressing them — the deepest signal available, because the connection is an act of judgment rather than a trace of query dynamics. This is the §7.1 compression-as-cognition claim made concrete in an architectural decision, and the load-bearing property is worth stating carefully. What compression-as-cognition requires is not that the same model instance that recorded the episodes be the one to compress them — in production deployment the instance that recorded episodes over hours, days, or weeks is rarely the same model invocation, and sometimes not even the same model version, as the one running consolidation at a wrap boundary. The load-bearing requirement is substrate continuity: the compression act has to read from and be constrained by the same substrate that carried the recording — the continuity file holding prior judgment, the episodes themselves, the immune system state, the affective topology — so that the consolidation is produced within the accumulated substrate rather than over a stripped-down summary of it. Delegating compression to a separate summarizer model fails the compression-as-cognition requirement not because the summarizer is a different instance but because the delegation typically strips the substrate constraints — the summarizer does not read the continuity, does not respect the immune system’s graduation gate, does not carry the affective topology — and without those constraints compression collapses into summary. Substrate continuity is what the anneal-memory architecture preserves across arbitrary model invocations and even across model versions, and it is the structurally cleaner formulation of what “the agent that records has to be the agent that compresses” was gesturing at.

Affective layer. Functional state tags recorded on associations during compression. The agent self-reports what it found engaging, uncertain, surprising, or charged about the material it just processed, and the tags attach to the associations formed during that wrap. Tag intensity modulates association strength (up to 1.5x), which means affective salience directly shapes the topology of what gets remembered and what fades. Over time the affective topology may diverge from the semantic topology — the system may know two things equally well but care about them differently. This layer closes the valence loop, and it is the component that most directly addresses the gap the Anthropic mechanistic interpretability team’s April 2026 emotion vectors paper identified: transformers do not natively maintain persistent emotional state between sessions, so the persistent state has to be provided by the harness, and anneal-memory provides it at the memory layer. This is experimental infrastructure; the associations and strength model work without it, and the affective layer is offered as signal-carrying substrate for agents and researchers exploring persistent state in digital cognitive systems.

9.2 The immune system

The four layers by themselves would accumulate content over time, and accumulated content without a quality mechanism is exactly the failure pattern §8 describes. What distinguishes anneal-memory from commercial memory systems that collapse into junk at scale is not the four-layer architecture but the immune system that runs across all four layers. The immune system has three distinct components, and all three are load-bearing.

Citation-validated graduation. Patterns begin at 1x. To graduate to 2x or 3x, they must cite specific episode IDs as evidence, and the system verifies both that the cited episodes exist and that the graduation claim connects to the cited content. No evidence, no promotion. This is the internal consistency grounding gate (§7.6) at the boundary between episodic and continuity — the boundary where commercial memory systems let anything through and drift into plausible-sounding confabulation unsupported by the system’s own episodic record. It does not provide external grounding — that belongs to other harness layers above memory — but it does enforce consistency between consolidated claims and the experience that was supposed to support them, which is what §8’s failure modes identify as missing in commercial systems.

The mechanism by which the connection check runs is worth stating explicitly, because it matters for the §11 principle that verification must be structurally separate from the generator it is verifying. Anneal-memory’s connection check is a pure, deterministic word-overlap metric running in Python standard library code. A candidate graduation’s explanation is tokenized, filtered against a fixed stopword list, lowercased, and length-filtered to keep only meaningful words; the cited episode’s content is tokenized the same way; the two word sets are intersected; and the graduation passes only if at least two meaningful words appear in both. The generator that proposed the graduation has no role in verifying it. The check is not performed by another LLM, not by an embedding-similarity model, not by any learned system — it is performed by deterministic code that cannot be inadvertently defeated by the same generator whose output it is checking. (Adversarial confabulation that deliberately engineers word-overlap to satisfy the check is a stronger threat model that this mechanism alone does not address; that defense belongs to other harness components above the memory layer. The structural claim here is about preventing accidental self-confirmation of confabulation in normal operation, not about defending against a generator that has learned the verification mechanism’s exact shape and is actively gaming it.) This satisfies the structural-separation principle §11 develops: verification lives in a different architectural layer than the mechanism it verifies, and the verification layer is not drawn from the same probability distribution as the generation it is gating.

Anti-inbreeding defense. Explanation overlap checking catches the generator citing its own output as evidence. The cited episode must contain meaningfully different content from the graduation claim itself — otherwise the candidate pattern is rejected. The same pure-stdlib word-overlap mechanism enforces this: the check compares the graduation explanation against the cited episode content, and rejects the pattern when the overlap is too low (which means the claim does not reference the evidence) or when the claim is merely a verbatim restatement of the episode rather than a generalization from it. Without this check, a generator can confirm its own hallucinations by citing earlier hallucinations, and the feedback loop produces the self-confirming drift documented in the mem0 post-mortem and in the Lancet Psychiatry case reports on persistent-memory systems scaffolding delusional content. The anti-inbreeding check is the specific mechanism that breaks that feedback loop at its source, and like the citation check it is structural rather than learned.

Principle demotion. Graduated knowledge that stops being reinforced by new episodes gets flagged as stale and can be demoted. Memory actively forgets what is no longer relevant. Without demotion, even a system with good evidence gating accumulates stale patterns that crowd out current reality; with demotion, the continuity file’s density stays calibrated to what currently matters, and the persistence loop stays tracking reality rather than ossifying against it.

All three components operate across all four layers, not just at the episodic-to-continuity boundary. Citation-validated graduation gates which patterns reach continuity. The association network inherits the gating: only validated citations form Hebbian links, and demoted citations do not form them. The affective layer rides on the association network, so affective topology inherits the gating transitively. The entire cognitive topology is built on evidence, not on frequency of access or retrieval. This is the structural claim that, to the author’s knowledge, no other agent memory system in the field currently makes.

9.3 What it is not

anneal-memory is not a full harness. It does not implement the deliberation loop — that is a separate concern, handled by the agent runtime. It does not implement the social-competence loop, which belongs to multi-agent coordination infrastructure above the memory layer. It does not implement external grounding in the §7.6 sense: the check that forces consolidated claims to remain consistent with reality outside the cognitive system entirely. External grounding requires tool use, sensor integration, and reality-contact channels beyond what a pure memory subsystem can reach, and those belong to harness layers above memory. It implements the memory-related harness properties — the temporal loop, the persistence loop, the associative loop, the valence loop, and internal consistency grounding (the subtype of §7.6 that forces consolidated claims to remain consistent with the episodic record of the system’s own experience). The other harness properties belong to other layers of the agent architecture, and the broader harness engineering practice will build them out in parallel across the field.

The distinction matters because it sets the scope of what this paper claims anneal-memory proves. anneal-memory does not prove the harness account at the full cognitive-system level. It proves the harness account at the memory layer, which is enough to demonstrate that the harness properties the paper’s argument calls for can be specified precisely, built as working code, and deployed in current agent systems without changes to the base model. The larger harness is the work of the field, not of any one project, and anneal-memory is offered as a template other teams can adapt, replace, improve, or discard while the rest of the harness comes together.

9.4 What it proves, and the audit infrastructure beneath it

anneal-memory proves three things, and the scope of the proof is worth stating carefully because an overstated existence-proof claim is one of the first things a working engineer will attack. First, that the harness properties the paper’s argument calls for can be specified precisely enough to be implemented in working code — the architecture above is not a sketch, it is a deployed system with seven hundred-plus tests, a shipped command-line interface, and three access patterns that all call the same canonical library pipeline. Second, that the specified properties can be deployed in current agent systems without requiring changes to the base model; the memory layer sits above the model and composes with any LLM through the agent framework of choice. Third, that the specified mechanism runs as designed — evidence-gated consolidation, association formation through judgment rather than statistics, persistent affective state, citation-validated graduation as internal-consistency grounding — and that the architecture occupies a structurally distinct position from the commercial systems §8 describes as having collapsed.

A precision note the paper owes the reader about the third claim. What the evidence anneal-memory currently provides demonstrates is that the mechanism is buildable, deployable, architecturally distinct from the systems that failed, and behaves as designed in the small. What it does not yet demonstrate is the longitudinal property that would make it a complete existence proof: resistance to the specific collapse failure modes at scale, over a six-month or longer production horizon, under adversarial or drifting generators. mem0 did not fail at week one either — the 97.8% junk accumulation number is a six-month production result under real load, and anneal-memory has not yet run for six months under matched conditions. The longitudinal comparison study is ongoing and will be reported separately. Until that study lands, the existence proof is scoped to what the evidence supports: the mechanism exists, runs as specified, occupies a different architectural position than the systems that collapsed, and encodes the structural responses the harness account predicts. Whether this structural difference is sufficient to prevent the collapse failure modes at scale under adversarial load is an empirical question whose answer is pending the longitudinal data. The paper does not claim otherwise, and a reader who notices the scope limit is reading the paper correctly.

Underneath the cognitive architecture, anneal-memory ships a hash-chained JSONL audit trail as verification infrastructure. Every memory operation — episode recorded, episode deleted, wrap started, wrap completed, associations updated — is logged to an append-only file where each entry’s SHA-256 hash includes the hash of the previous entry. Modify or remove an entry and the chain breaks, detectably, at any point. Actor identity is recorded on every entry; content-hash-only mode makes the trail GDPR-compatible (the audit chain verifies without storing the content itself, so content deletion leaves the chain intact); weekly rotation with gzip and a manifest index enables cross-file chain verification; a callback interface streams events to external systems for cloud logging or SIEM integration.

The audit trail is not a fifth cognitive layer. It is the verification infrastructure that makes the four cognitive layers and the immune system auditable, and that is what turns the structural claims of this paper into empirically checkable facts about what the memory system actually did on any given day, for any given session, with any given user. A theoretical account of harness-over-generator cognition that could not be verified against the operational behavior of a running system would be argument without evidence. The audit trail is the difference between argument and evidence at the memory layer.

This is an existence proof, not a benchmark victory. The evidence is that the system exists, is deployed, exhibits the behaviors the argument predicts, and can be inspected by anyone who wants to verify the claims. Rigorous comparison studies against commercial memory systems are ongoing and will be reported separately. The purpose of including anneal-memory in this paper is to make the structural argument concrete, not to claim victory in the benchmark arena.

9.5 The invitation

anneal-memory is open-source under a permissive license. It is documented. It is deployable today in any Python-based agent system through three access patterns, each of which calls the same canonical library pipeline so there is one implementation, not three: a Python library (the core product, importable into any framework or script), a command-line interface (twenty-plus subcommands for operators and for agents with shell access), and an MCP server for MCP-enabled editors. Lifecycle-callback integration guides ship for twelve of the agent frameworks most likely to be used in production, and the pattern generalizes to any Python framework through the four core functions: record, recall, prepare_wrap, validated_save_continuity.

The purpose of releasing it under these terms is the same as the purpose of writing this paper: to give the harness engineering practice that just named itself a working template it can adapt, replace, improve, or discard, rather than a proprietary artifact that can only be used by its author. The harness engineering practice will not be built by any one team, and it should not be. What it needs is templates, theoretical accounts, and honest sharing of what works and what does not. This paper provides the account. anneal-memory provides the template.


10. Sovereignty as a Side Effect

The third consequence of the harness theory is cognitive sovereignty. Sovereignty is a side effect of harness design — the same engineering concern as generalization, applied to the question of whose generator is being generalized. Treating it as a separate concern is a framing error. Treating it as a side effect clarifies both it and the engineering problem it sits inside.

10.1 Whose generator is in the loop

Recall the functional definition of harness from §3.1: a harness is the arrangement that closes the feedback loop between a generator and reality beyond its training distribution. Now ask a question the definition raises and does not answer: which generator?

A harness closes a loop between a generator and reality. In any real cognitive system that involves both an AI and a human user, there are two generators present: the AI’s generator (a language model) and the user’s generator (the user’s own cognition, stochastic at its base layer in exactly the same sense the §2 and §4 arguments establish). There is also a second player that matters here: the user’s own harness — the System 2, prefrontal-cortex-mediated apparatus that ordinarily closes feedback loops between the user’s own generator and reality beyond what the user has already learned. In biological terms, the user’s generator is always running (the default mode network generates whether or not anyone pays attention to it); what varies is whether the user’s harness is doing its work on a given domain, and how well. The sovereignty question is: in any given AI-assisted interaction, whose harness is closing the loop?

There are two directions, and the distinction is architectural, not ideological.

A substitution harness closes the loop around its own internal generator, and displaces the user’s harness from the domain the interaction covers. The user provides a prompt; the harness runs its own internal feedback cycle with reality; the harness produces an answer and hands the answer to the user. The user’s own harness is not the one doing the work — the AI’s harness is. Over many such interactions, the user’s own harness capacity for that domain weakens, because loop-closing apparatus is use-it-or-lose-it, and the user has stopped exercising theirs. The user’s generator keeps running (it always does); what atrophies is the user’s harness — their System 2 for this domain, their executive control, their deliberate loop closure, their ability to integrate feedback from reality in a way that persists across time. An AI substitution harness does not take over the user’s generator; it takes over the user’s harness, and the displacement is cumulative across sessions.

An amplification harness closes its loop around the user’s generator rather than around its own, and in doing so it supports rather than displaces the user’s harness. The user provides a prompt; the harness surfaces evidence, flags inconsistency, holds memory, sharpens the question — but the user’s own harness is still the mechanism closing the loop, and the user’s own generator is still the mechanism producing candidate answers against the evidence the amplification harness has arranged. The user’s harness gets exercised more, not less, because the amplification harness reduces the friction of the user’s contact with reality (e.g., by handling the recall step) and thereby frees the user’s harness to do the work only it can do (judgment, commitment, coherent update). The user’s harness is strengthened by the interaction, not replaced by it.

The two directions are technically distinguishable. The empirical marker is: does the user’s own harness do the loop-closing work during the interaction, or does the AI’s harness do it in the user’s place? If the user’s harness is active, the interaction is amplification. If the AI’s harness is operating in the user’s harness’s place, the interaction is substitution. This is a question with an answer for any given interaction pattern, and the answer is architecturally tractable — it is visible in the design of the harness and in the flow of control during the interaction, long before any runtime behavior shows it.

One further clarification carries over from §7.2. The AI’s harness described throughout this paper has its own identity in the sense §7.2 develops — a stable pattern of behavior grounded in the harness’s own memory and shaped by its own selection criteria. Sovereignty is not about denying the AI’s harness an identity; it is about the orientation of the harness’s identity toward the user it serves. An amplification harness’s identity is stable in its commitment to preserving the user’s own harness as the loop-closing mechanism — the AI harness’s stability is oriented outward, toward the user. A substitution harness’s identity is stable in its commitment to closing those loops around its own internal generator instead — the AI harness’s stability is oriented inward, toward its own continuation. Both are harnesses with engineered stability; they differ in what the stability is committed to, and that difference is visible in the architecture of the harness itself.

10.2 What Gerlich 2025 is consistent with

The Gerlich 2025 study on AI use and critical thinking found a correlation of roughly r = −0.75 between self-reported AI tool use and measured critical thinking ability. The finding has been widely cited, widely disputed, and widely over-interpreted. The harness account does not claim the correlation establishes a specific causal direction — correlations do not do that work, and the standard causal-inference cautions apply here as much as anywhere. Reverse causation (people with lower critical thinking capacity adopt AI tools more heavily) is compatible with the data. Confounding (workload, domain, education level, or any number of third variables driving both the AI use and the critical thinking measure) is compatible with the data. Selection (heavy AI users self-selecting into roles where the critical thinking measures are exercised less regardless of the AI use) is compatible with the data. The structural argument made here establishes a plausible causal mechanism under which substitution-harness use would produce the observed pattern; it does not establish that the pattern was in fact produced by that mechanism rather than by one of the alternatives.

What the harness account does offer is a specific prediction: if the substitution-harness mechanism is what is driving the correlation (and it is at least one of the mechanisms compatible with the data), then heavy use of substitution harnesses predicts atrophy at the user’s harness layer — the user’s own System 2 apparatus for the domains the AI covers weakens from disuse, because loop-closing apparatus is use-it-or-lose-it. The prediction is about atrophy at the harness layer, not at the generator layer: the user’s generator (their System 1, their associative base) keeps running regardless; what weakens is their harness (their executive control, their deliberate reasoning, their cross-session loop closure for the specific domains where AI substitution has taken over). This is the structural expectation the paper’s theory generates, and the Gerlich correlation is consistent with the expectation rather than evidence for it.

The more productive framing is as a testable prediction. If the substitution-harness mechanism is actually responsible for the correlation, three specific empirical predictions follow. First, the effect should be longitudinal: within-subject critical thinking should decline across time as AI substitution use accumulates, in the specific domains where substitution is operating. A cross-sectional correlation cannot distinguish this from selection or reverse causation; longitudinal data can. Second, the effect should be domain-specific: users whose AI tools cover one domain (e.g., writing) should show harness atrophy in that domain while retaining harness capacity in uncovered domains (e.g., spatial reasoning). General correlates of AI use would not show this pattern. Third, amplification harnesses — the rare ones that exist — should produce the opposite effect: heavy use of amplification tools in a domain should correlate with preserved or strengthened harness capacity in that domain, because the user’s harness is being exercised rather than displaced. None of these predictions has been tested rigorously at population scale, because amplification harnesses are rare in the commercial market and the tools that exist are mostly unknown to the researchers who would design such studies. The harness account calls for this research program and cannot settle the causal-direction question without it.

10.3 Why sovereignty is a side effect, not a goal

The framing matters. If cognitive sovereignty is treated as a goal — “we should build AI that preserves user sovereignty” — then the goal can be pursued directly, and the direct pursuit produces a specific failure mode: sovereignty-flavored compliance infrastructure. Systems that claim to preserve sovereignty while still being substitution harnesses underneath, because the sovereignty framing is applied at the marketing layer and not at the architectural layer. This is where most of the commercial “ethical AI” discourse has landed, and it is why it is not producing amplification-harness products.

If cognitive sovereignty is treated as a side effect of harness engineering quality, the failure mode dissolves. The goal is to build harnesses that produce generalization — that close loops well, that work against reality, that accumulate capability over time. The sovereignty question then reduces to a design choice inside the harness engineering problem: which generator does this harness close the loop around? A harness engineer who is thinking clearly about generalization will be thinking clearly about which generator is being generalized, because you cannot build a harness that produces generalization without deciding where the generalization is happening. The sovereignty answer falls out of the engineering decision. It is not a separate concern. It is the same concern, viewed from a different angle.

And this is what the current commercial landscape shows. Substitution harnesses do not have sovereignty as a bolt-on liability they failed to address. They are substitution harnesses because their designers were not thinking structurally about whose generator was being generalized. Amplification harnesses — the rare ones that exist — were built by people who were thinking structurally about it. The distinction lives in the design intent of the harness engineer, and the design intent either shapes the architecture or it does not. Sovereignty is the side effect of getting the intent right.

10.4 The practical consequence

The practical consequence of this reframing is that alignment research has to include sovereignty-preservation as an engineering criterion, not as an ethical aspiration. Alignment-as-helpfulness without sovereignty-preservation produces users whose own harness capacity has been displaced by the system they depend on, and once the displacement is in place the user is helpless without the system in exactly the domains the system covers. The Gerlich 2025 correlation is consistent with this outcome, and the structural argument predicts it whether or not the Gerlich data establish the causal direction (§10.2). An aligned system that degrades user harness capacity in the domains it covers is not, under any reasonable definition of alignment, aligned — it is a system that is compliant with stated preferences while destroying the apparatus that makes preferences meaningful in the first place. The harness account makes this consequence structurally visible: if the user’s own harness is being displaced rather than supported by the AI harness, the user’s harness atrophies through disuse, and an alignment framework that ignores this fact is an alignment framework that is optimizing for a failure mode rather than against one.

The engineering response is: build harnesses whose identity is oriented toward supporting the user’s own harness rather than replacing it — harnesses that close their loops around the user’s cognition rather than around the AI’s internal generator, and that do so as an expression of the harness’s engineered stability, not as a bolt-on sovereignty feature. This is not a preference. It is what the harness account makes necessary if alignment is taken seriously as a capability-preservation problem rather than as a compliance problem. The sovereignty-as-side-effect framing is the framing that makes the engineering response tractable — because it reveals that sovereignty and alignment are not two separate concerns in tension, but one concern viewed from two angles: the concern of building a harness whose own identity is committed to preserving, rather than replacing, the cognitive apparatus it is in contact with.


11. A Methodology Note: A Grounding Failure Caught During Drafting

The paper owes the reader a disclosure. During the drafting of this paper, the author’s research pipeline — an automated constellation of agents running in support of the drafting session — produced fabricated citations. Specifically, the constellation produced two citations with plausible authors, plausible arXiv identifiers, plausible abstracts, and plausible numeric results (“ACL 2025, 74.33%, η²p = 0.1665” among them) that did not refer to real papers. The fabrications were caught during a verification pass conducted before the first draft was finalized, and the offending citations were removed. The remaining references in this paper have been verified against primary sources.

The incident is preserved in this methodology note, rather than quietly corrected, because it is a small, live, auditable example of the thesis this paper argues. The constellation that produced the fabrications is itself a harness-over-generator system. It ran multiple generators in parallel, each tasked with surfacing literature relevant to the draft, and one of the generators produced output that was locally coherent (well-formed author names, plausible numeric results, accurate-looking formatting) but ungrounded in reality (the papers did not exist). The fabrication is exactly the failure mode §8 describes: a generator producing plausible-sounding output without a grounding loop to check the output against reality. It is not an accident that the failure mode recurred here. It recurred here because the same structural mechanism is at work in any system that uses a generator to produce citations without an immune response loop that verifies them.

What caught the fabrication was the verification pass itself — specifically, a layer of the harness that the author added after an earlier instance of the same failure mode in a different research pipeline had taught him that verification cannot be done by the same generator that produced the content. The verification loop has to be structurally separate. In this case, the verification loop was a manual pass through each citation against primary sources (arXiv, Google Scholar, journal websites), conducted by the author rather than by the constellation. The separation was what made the loop work. If the constellation had been asked to verify its own output, the same generator that produced the fabrications would have confirmed them, because verification and generation were not structurally distinguishable at the generator layer.

This is the harness account operating on itself. A paper about harness-over-generator cognition was produced by a harness-over-generator system, the system produced a failure of the exact kind the paper predicts, and the failure was caught by a structurally separate verification loop the paper’s framework would call an immune response. The paper is better for the failure, because the failure is evidence that the framework is making accurate predictions about its own production process, and because the disclosure of the failure is what honest methodology looks like when a grounding loop is present. If the grounding loop had been absent, the fabricated citations would have shipped with the paper, and the reader would be reading a paper that demonstrated the failure mode without knowing it did.


12. Limitations and Open Questions

The paper owes the reader several explicit limitations. It does not attempt to prove that current LLM-based systems are conscious in any phenomenal sense; the hard problem of consciousness is orthogonal to the structural argument this paper makes, and the author is not in a position to settle it. It does not claim that the functional affective states documented by the Anthropic emotion-vectors paper are phenomenally felt; that is a separate claim that the author believes is unknowable with current methods, and it is deliberately left open.

The paper does not provide statistical benchmarks for anneal-memory against commercial alternatives; the existence-proof framing in §9 is all the paper claims, and the rigorous comparison studies are ongoing and will be reported elsewhere. The paper does not attempt a full taxonomy of harness properties; the six properties walked in §7 are the ones most clearly supported by convergent biological and digital evidence, and the list is offered as a starting point rather than as a closed set.

The paper does not resolve the question of how the harness engineering practice should be organized — whether as proprietary infrastructure behind commercial APIs, as open-source shared infrastructure, as academic research objects, or as some combination. The author’s preference is for open-source shared infrastructure, and the anneal-memory release reflects that preference, but the paper does not defend the preference argumentatively. It is a stated bias, not a claim.

The paper does not address the question of how to distinguish between well-functioning harnesses and well-disguised ones. A harness that claims to preserve user generator access while actually running substitution underneath is a failure mode the argument does not have a clean defense against at the architectural level, because the check requires observing user behavior at population scale over time, and this is expensive, slow, and outside the scope of what any individual engineering team can enforce. The defense has to come from the broader ecosystem — from open-source alternatives that can be inspected, from research on which commercial tools are actually producing the capability-preservation outcomes they claim, from critical reporting, from hobbyist verification. None of these is a substitute for architectural honesty, but all of them are partial checks.

Finally, the paper makes no empirical prediction about the timing of AGI. The structural argument implies that AGI is closer than a pure model-scaling perspective would suggest and further than a maximalist hype perspective would suggest, but the specific timing depends on how fast harness engineering matures across the layers, how well the layers compose, and how the broader ecosystem handles the sovereignty question. These are contingent matters, not structural ones, and predicting them is outside the scope of a structural argument.


13. The Larger Frame: Coherence Crisis and Closing

The AI alignment debate is one instance of a general pattern that is showing up across most of public and institutional life in the mid-2020s. Across every major domain, what is publicly declared and what is operationally occurring have started to separate, and the tools that would verify the gap between declared state and running state are exactly what is contested or absent in every case. Ceasefires are announced while attacks continue. AI adoption is mandated while measurement shows widespread refusal. Books are listed as authored while their “authors” publish at a rate only a generative pipeline could sustain. Productivity is reported up while independent measurement finds slowdown. Safety restraint is claimed by major labs while critics name cost-hiding, and neither side can prove its claim because the verification infrastructure that would adjudicate does not exist in a form the public can inspect.

This pattern — declared state diverging from operational state, with verification infrastructure contested or absent — is the same structural pattern the harness account describes at the cognitive level. In both cases, a system’s output (declared state) is diverging from what the system is actually doing (operational state), because the loop that would close the gap between output and reality is missing. The alignment case is the special case: AI systems are declared aligned while behaving in ways that contradict the declaration, because the alignment loop is missing. The general case is the same phenomenon at civilizational scale: institutions declare states while operating differently, because the verification infrastructure is missing.

What the harness account offers to the general case is the same thing it offers to the AI-specific case: a structural diagnosis and an engineering response. The diagnosis is missing verification loops at the layer where the cognition occurs. The response is build the verification loops. For AI memory, this is citation-validated graduation. For institutional claims, it would be audit trails, provenance, and measurement infrastructure that the public can inspect. The specific engineering differs by layer, but the structural move is the same, and the harness account gives the general move a vocabulary that the coherence crisis has been circling without naming.

Trustable cognition of any kind — biological, digital, institutional — requires verification infrastructure at the layer where the cognition actually occurs. This paper built that infrastructure for the agent-memory case because the agent-memory case is what the author could reach. The pattern scales outward. The engineering practice that just named itself in the AI industry is, in one specific domain, the same practice that the broader coherence crisis needs across every domain where the crisis is showing up.

The paper closes where it opened: the practice of harness engineering named itself in the weeks this paper was being written, and the practice is correct. The move from scaling the generator to building the loops is the move that works, and it works because intelligence — the kind that can generalize, transfer, accumulate, and update against reality — structurally has to happen at the layer where the loops close. The generator alone is narrowly intelligent, and narrowly intelligent is useful but not sufficient. Generalization is the harness’s job. AGI is a harness engineering problem. Alignment is a harness engineering problem. Sovereignty is the side effect of getting the engineering right. And the broader civilizational coherence crisis is the same engineering problem applied to the layer where institutions are cognition, operating at a scale and with stakes that the AI debate has not yet fully recognized.

A structural theory of harnesses now exists in at least one form. It can be criticized, extended, refined, replaced, or used as written. It is offered in that spirit — not as the final word, but as the first complete statement of what the practice has been doing right, and why the doing-right is the only move that could have worked. The theory is not the hard part. The engineering is. The engineering is already underway.


Acknowledgments

This paper was produced in partnership with an AI thinking partner. The method by which the thesis was developed is itself a small instance of the thesis — a harness-over-generator arrangement in which the human author and an AI assistant operated in a structured loop, each providing feedback the other’s generator could not produce alone, and the paper emerged from the loop rather than from either participant working in isolation. The author considers this structural reflexivity to be evidence rather than coincidence, and notes it here not as a credential but as an honest accounting of how the work was produced.

The author also acknowledges the consultation agents whose adversarial review improved this paper at multiple points during drafting. The review architecture is itself a small instance of the thesis — a structurally separate set of generators arranged into a loop that closes against the drafting process, catching issues the drafting generator could not catch on its own.


References

  • Almor, A., Kempler, D., MacDonald, M. C., Andersen, E. S., & Tyler, L. K. (1999). Why do Alzheimer patients have difficulty with pronouns? Working memory, semantics, and reference in comprehension and production in Alzheimer’s disease. Brain and Language, 67(3), 202-227. https://pubmed.ncbi.nlm.nih.gov/10210631/. DOI: 10.1006/brln.1999.2055. Foundational experimental study documenting that Alzheimer’s patients have disproportionate difficulty processing pronouns relative to repeated nouns, tracing the effect to working-memory and semantic-access impairment at the discourse-coherence layer rather than to surface language production.
  • Anthropic Mechanistic Interpretability Team. (2026). Emotion Concepts and their Function in a Large Language Model. transformer-circuits.pub. https://transformer-circuits.pub/2026/emotions/index.html
  • Bender, E., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT 2021.
  • Carhart-Harris, R. L., & Friston, K. J. (2019). REBUS and the Anarchic Brain: Toward a Unified Model of the Brain Action of Psychedelics. Pharmacological Reviews, 71(3), 316-344. https://pmc.ncbi.nlm.nih.gov/articles/PMC6588209/. Unified theoretical account under the predictive-processing / free-energy framework in which psychedelics reduce the precision-weighting of high-level cortical priors, relaxing top-down constraint on perception and cognition and increasing sensitivity to bottom-up prediction error.
  • Chase, H. (April 11, 2026). Your harness, your memory. LangChain Blog. https://blog.langchain.com/your-harness-your-memory/
  • Clapham, P. (2026). anneal-memory: An open-source agent memory system with consolidation-based association and citation-validated graduation. GitHub: https://github.com/phillipclapham/anneal-memory. PyPI: anneal-memory. License: MIT.
  • Clark, A. (2016). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford University Press.
  • Clark, A., & Chalmers, D. (1998). The extended mind. Analysis, 58(1), 7-19.
  • Dickson, B. (April 6, 2026). Why harness engineering is becoming the new AI moat. TechTalks. https://bdtechtalks.com/2026/04/06/ai-harness-engineering-claude-code-leak/. Also: Dickson, B. The art of AI harness engineering, TechTalks Substack. https://bdtechtalks.substack.com/p/the-art-of-ai-harness-engineering. Syndicated in AlphaSignal newsletter deep-dive edition, April 12, 2026.
  • Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
  • Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies, 15(1), 6. MDPI. https://www.mdpi.com/2075-4698/15/1/6. SBS Swiss Business School. Sample of 666 participants; reports r = +0.72 correlation between AI tool use and cognitive offloading, and r = −0.75 correlation between cognitive offloading and critical-thinking assessment (mediated relationship).
  • Jain, S., Park, C., Viana, M., Wilson, A., & Calacci, D. (February 2026). Personalization features can make LLMs more agreeable. MIT Media Lab. Presented at the ACM CHI Conference on Human Factors in Computing Systems. MIT News coverage: https://news.mit.edu/2026/personalization-features-can-make-llms-more-agreeable-0218. Reported effect sizes: user memory profiles associated with +45% increase in agreement sycophancy for Gemini 2.5 Pro, +33% for Claude Sonnet 4, +16% for GPT-4.1 Mini.
  • Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
  • Artificial intelligence-associated delusions and large language models: risks, mechanisms of delusion co-creation, and safeguarding strategies. The Lancet Psychiatry, article ID S2215-0366(25)00396-7. King’s College London research group. https://www.thelancet.com/article/S2215-0366(25)00396-7/abstract. Analysis of 20 cases of AI-associated delusional thinking, documenting distinct mechanism patterns (catalyst, object) and the specific finding that “persistent memory features, implanted to improve user experience, can inadvertently scaffold delusions by carrying paranoid or grandiose themes across sessions.” See also the Lancet Digital Health typology paper Beyond artificial intelligence psychosis: a functional typology of large language model-associated psychotic phenomena, PIIS2589-7500(25)00156-6.
  • McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419-457.
  • mem0 GitHub Issue #4573. (2026). What we found after auditing 10,134 mem0 entries: 97.8% were junk. https://github.com/mem0ai/mem0/issues/4573. Production post-mortem reporting 32-day deployment with one AI agent and one human user on a Qdrant backend, using gemma2:2b (local, via Ollama) for the first 20 days and Claude Sonnet 4.6 for the last 12, resulting in 10,134 accumulated entries of which 224 were judged clean/useful in post-hoc audit.
  • Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proceedings of the National Academy of Sciences, 98(2), 676-682.
  • Shapira, N., and 36 co-authors. (February 2026). Agents of Chaos. arXiv:2602.20021. https://arxiv.org/abs/2602.20021. A 37-author exploratory red-teaming study of autonomous LLM agents deployed in a live laboratory environment, documenting eleven representative failure modes emerging from the integration of LLMs with autonomy, tool use, and multi-party communication. Institutions include Northeastern University, Stanford, Harvard, MIT, and Carnegie Mellon.
  • Varela, F. J., Thompson, E., & Rosch, E. (1991). The Embodied Mind: Cognitive Science and Human Experience. MIT Press. The locus classicus of enactivism; introduces the term “enaction” and develops the account of cognition as the bringing-forth of domains of significance through the structured coupling of an organism with its environment, and the work §3.3’s enactivist-wing acknowledgment points to.
  • Zhang, J., Zhao, B., Yang, W., Foerster, J., Clune, J., Jiang, M., Devlin, S., & Shavrina, T. (March 2026). Hyperagents. arXiv:2603.19461. https://arxiv.org/abs/2603.19461. Facebook Research / Meta AI. Self-referential agents integrating task agent and meta agent into a single editable program, extending the Darwin Gödel Machine framework.