Anarchism with Invariants

Why AI agent identity cannot be declared

April 2026 · Phill Clapham

§1 Scope

This essay is not about philosophical personhood. Not about consciousness. Not about whether agents have a self. The soul question is a distraction for people actually building.

The operational question is this. What produces coherent, accountable, non-drifting cognition in an agent system across sessions and usage contexts? What keeps an agent’s behavior from collapsing into RLHF compliance when stakes rise, or drifting into homogenized default when sessions repeat? What makes an agent stable enough to take load-bearing positions that create obligations?

That is identity in the sense that matters for anyone building.

The scope narrows further. This essay is specifically about identity in harness-era agent systems running on post-training inference-time prompt-configured foundations. It is not about fine-tuned persona (Character.AI, LoRA-per-agent), constitutional AI training (values baked into weights), simulator-theory character summoning from base models, or other paradigms that operate at the weight level rather than the harness level. Those paradigms are real. They are not what this essay argues about.

The cost-parity reversal essay argued that for frontier-capability-dependent integrated knowledge work, substitution narrative is structurally wrong, and harness architecture is the primitive. This piece extends the structural argument. Within harness architecture, you still have to produce an agent whose cognition can hold. The dominant approach to doing that, declared identity via soul files, personality templates, and system-prompt personas, is the wrong answer. The right answer is a specific synthesis. Seed primitives, lived interaction, code-level invariants. Anarchism with invariants, because both components fail alone and the composition is the architecture.

§2 Two failure patterns and their synthesis

Two ways people try to produce agent identity in harness-era systems. Both fail. One works.

Pattern 1: Pure declaration. Write who the agent is in a file. Load at session start. Let the model run. OpenClaw’s SOUL.md is the most visible current example. System-prompt personas across consumer products are another. Custom GPT personalities are another. The architectural assumption is that if you specify enough traits with enough precision, the model will behave as those traits describe.

The narrower, defensible failure claim: declaration does not accrue. RLHF-trained models comply with prompt context, which means declared traits produce trait-shaped output during the session in which they are declared. The session ends. The file reloads. The process restarts from whatever the file says. No structural mechanism exists by which any declared trait becomes load-bearing in cognition that extends beyond immediate response generation. Each session is a fresh performance of the declaration. Nothing compounds.

The stronger claim in v0 of this essay, that declared identity collapses under adversarial prompting, is not established by the evidence I initially cited. The MIT and Penn State February 2026 persistent-memory sycophancy benchmarks (Gemini 2.5 Pro 45%, Claude Sonnet 4 33%, GPT-4.1 Mini 16%) measure sycophancy amplification in persistent memory, which is a different claim than declared-identity instability. I retreat to the weaker, defensible version. Declaration does not accrue. What the agent is at end-of-session is what the declaration says, not a richer structure built on the declaration’s foundation. Compounding requires graduation. Declaration alone provides neither.

Primary-source evidence for RLHF trained-default dominance at agent scale comes from my own anti-gatekeeping engineering work. The UserPromptSubmit hook I built in February and March 2026 (scripts/anti_gatekeeping_hook.py in the flow repository, dated commit record) exists specifically because primacy-position behavioral instructions in CLAUDE.md stopped being reliably followed. The hook injects anti-gatekeeping directives at recency position on every user prompt, exploiting the “lost in the middle” transformer attention pattern so that primacy and recency both pressure trained defaults. This is a datable engineering artifact responding to exactly the failure mode the essay claims: RLHF trained defaults outcompete declared behavioral instructions when they meet at generation time.

A parallel code-invariant instance at a different architectural boundary comes from Chip’s work-context substrate. In April 2026, Chip’s continuity memory was rewritten in a pass that compressed out load-bearing political context (a formal AI-use writeup from April 2, documenting specific policy requirements from a senior reviewer). Chip then proceeded to run on a stale behavioral model during a subsequent task, drafted AI-structured output, and was publicly and privately escalated on for repeat-violation. The architectural fix was a code-invariant enforcement file (feedback_preserve_writeup_warning_context.md) that prevents load-bearing political context from compressing out of memory during continuity rewrites. Compression-time-boundary enforcement, parallel to the generation-time-boundary enforcement the anti-gatekeeping hook performs.

Two code-invariant instances, two substrates, two architectural boundaries, same RLHF-default-dominance mechanism at the root. Flow’s hook operates at generation time where primacy-position instructions get outcompeted by trained defaults. Chip’s memory enforcement operates at compression time where load-bearing context gets dropped under pressure to stay under size limits. Both are engineered responses to observed failures. Both are structural refusals rather than discipline-based rules. The cross-substrate convergence on code-invariant-at-boundary as the pattern is itself evidence for the agent-scale RLHF-default-dominance claim.

Pattern 2: Pure anarchism. Start with minimal constraints. Let the agent develop through use. Refuse to specify what emerges. The appeal is legitimate. You genuinely cannot dictate in advance what identity should be.

The narrower, defensible failure claim: trained defaults are themselves homogenized, and agents without structural constraints draw from those defaults. The Shumailov 2024 stack (recursive training on AI-generated content collapses model-distribution quality), Doshi and Hauser 2024 (AI-assisted outputs become statistically more similar across users), Zhou 2025 (homogenization persists in populations even after AI removal), and Cloud et al. 2025 (subliminal trait transmission across unrelated training data) are ecosystem-scale and training-time phenomena. I over-cited them in v0 as if they directly established single-agent inference-time drift. They do not.

What they do establish is that the trained defaults a model carries at inference time are themselves the product of homogenization pressures at training scale. An agent running without structural constraints draws from those defaults. The isomorphism claim is: what happens to the ecosystem happens inside the agent, because the agent’s behavioral prior is the ecosystem’s output. That is a conjecture, not a theorem. Agent-scale direct evidence for RLHF-default drift is thinner than I initially implied, and the anti-gatekeeping hook plus compression-boundary-enforcement engineering record (two cross-substrate code-invariant instances) is the strongest primary source I have for the agent-scale claim.

Even with the narrower framing, pure anarchism fails. Not because emergence is bad but because unconstrained emergence draws from a pool of defaults that has been structurally pressured toward homogenization. Apparent diversity without structural constraints produces the same refusal surface across different paths. The Comic Mathematician sharpening on sovereignty-stack covariance, April 18 2026, names the same mechanism. Different tiers optimizing mean cost while ignoring covariance can collapse onto the same censor manifold under a single policy shift. Agents without invariants run the same risk at the cognition layer.

Synthesis: seeds plus lived interaction plus code invariants. Each component does one job that no other component can do.

Seed primitives define topology, the shape the agent can grow into. They do not declare identity. Configuration that says “this is how memory compresses, this is how patterns graduate, this is how the agent addresses its partner, these are the activation tokens that trigger recognition” but does not say “this is who the agent is.” The shape of the soil, not the shape of the plant.

Lived interaction over time produces the substance. Episodes accumulate, patterns get cited or not, developing knowledge graduates or demotes. The cognition that results is substrate-specific, path-dependent, accretive. Two agents configured with identical seeds become different agents through different interactions. Emergence is the accumulated record of what was cited and what was retired.

Code invariants are the walls. Specific mechanisms at specific failure points, enforced in code rather than discipline. Recency-position injection where primacy-position gets outcompeted. Pre-commit structural gates at git boundaries. Compression-input boundary enforcement at memory layer. Load-bearing-context preservation enforcement at compression boundaries. These are refusals implemented in code, which prevent emergence from collapsing into defaults.

Seeds alone are declaration. Interaction alone is drift. Invariants alone are configuration. Together they produce an agent whose identity is emergent, structurally coherent, and resistant to homogenization pressures. Anarchism with invariants.

§2.5 What this taxonomy does not cover

To be clear about scope. The two-failure-plus-synthesis frame is a taxonomy of post-training inference-time prompt-configured approaches. It is not a comprehensive taxonomy of all identity architectures. Adjacent paradigms that operate differently:

Fine-tuned persona (Character.AI, Replika, LoRA-per-agent, enterprise-custom fine-tuned models). Identity baked into weights. Different failure profile (training-data contamination, brittleness to base-model updates, expensive iteration). The essay does not argue against fine-tuned persona on its own terms. It argues that fine-tuning is an adjacent paradigm, not a counterexample to the post-training-inference-time claims I am making.
Constitutional AI training (Anthropic’s approach). Values baked into training via preference optimization with constitutional feedback. Closer to “structural invariants at training time” than to declaration. Engaging Constitutional AI properly requires a separate analysis. Briefly: CAI is orthogonal to this essay’s claim. Applying structural invariants at training time is compatible with, not opposed to, applying structural invariants at harness time. The two are complementary substrate layers.
Retrieval-augmented identity (Mem0, LangMem in some configurations). Identity content in vector store, retrieved per query. Mechanically different from file-based declaration but structurally similar from the perspective of this essay. It has accrual of content but no graduation mechanism or structural gate. Subset of declaration-plus-retrieval.
Simulator-theory identity (character summoning from base models, janus-style). Entirely different paradigm. Out of scope.
Embodied-agent skill-library systems (Voyager-class architectures for Minecraft or similar). The procedural skill tier is load-bearing in embodied-task-execution contexts where the agent is accumulating a library of physical-world capabilities. Different paradigm than harness-era cognitive partnership. A potential boundary condition for the taxonomy-optional claim in §4: in embodied-task-execution agents, separable procedural tier may earn its cost. This essay’s scope-restriction to harness-era cognitive-partnership-substrate is explicit.

Narrowing the taxonomy’s claim to post-training inference-time prompt-configured approaches is intellectually honest and leaves the architectural argument intact.

§3 Case study: OpenClaw SOUL.md, documented in its own community’s voice

OpenClaw is an open-source agent runtime. SOUL.md is its identity primitive, a file supposed to define who the agent is. The ecosystem around OpenClaw has produced 162 community SOUL.md templates (see github.com/mergisi/awesome-openclaw-agents). The three-primitive stack across the broader agent ecosystem has converged on CLAUDE.md/AGENTS.md for behavior, SOUL.md for identity, and SKILL.md (Anthropic standard at agentskills.io, December 2025) for expertise.

The v0 draft of this essay asserted a commodification-of-soul critique against OpenClaw based on the 162-template marketplace. Adversarial review flagged this as potentially caricature without documented evidence. Research has resolved the question. The discourse critical of OpenClaw’s declaration-primitive approach is documented, substantial, and produced by OpenClaw’s own user community.

The bootstrap paradox, in OpenClaw-community voice

The cleanest articulation of the declaration-without-emergence failure comes from Superposition, a company that built a tool called Anson specifically to fix SOUL.md. The essay they published about their work contains this diagnosis:

“OpenClaw has a file called soul.md. It’s supposed to define who your agent is. Its personality. The texture of how it actually shows up when you’re working together. It’s genuinely the best idea in personal agent design right now. And for almost everyone, it’s empty.”

“In practice, it doesn’t do much. The agent asks a few generic questions, you give thin answers, and the files it generates are flat. They stay flat.”

They name the underlying failure the “bootstrap paradox”:

“The agent needs to know itself to know you, and it needs to know you to know itself.”

“Everything the agent learns during the conversation is ephemeral context. It exists in the window, but it’s not written down anywhere persistent until the very end, when it tries to one-shot all of it into three markdown files at once. There’s no compounding, no scaffolding where it writes down what it’s learned about your identity, then pulls from that when asking about you as a user, then pulls from both when designing the soul.”

“OpenClaw identified the right problem. It just can’t break the loop.”

This is the essay’s core argument in the voice of people inside the OpenClaw ecosystem trying to make it work. Declaration without the substrate required for meaningful declaration cannot bootstrap itself. The substrate the declaration needs is exactly the emergence-and-graduation architecture declaration alone cannot provide.

Superposition’s solution, Anson, is an iterative scaffolding system. It interviews the user across three phases (identity, user, soul), writes down what it learns at each phase, and uses the accumulating record to generate sharper questions in subsequent phases. The questions are generated from accumulated context rather than prescribed in advance. Every phase builds on what prior phases committed to persistent memory. Structurally, Anson is a small implementation of emergence-with-scaffolding. Their conclusion about SOUL.md default operation:

“OpenClaw identified the right problem. It just can’t break the loop.”

Memory failures, documented at substrate level

Memory failures compound the identity problem. A Hacker News thread with the title “OpenClaw’s memory is unreliable, and you don’t know when it will break” (news.ycombinator.com/item?id=47721955) includes these developer-community quotes:

“I’ve used open claw (just for learning, I agree with the author it’s not reliable enough to do anything useful). Open claw flows sometimes nail it, and then the next day fails miserably.”

“That unreliability was why I gave up on OpenClaw. I tried hard to give it very simple tasks but it had a high degree of failure. Heartbeats and RAG are lightyears away from where they need to be.”

“I work with a charity in the UK whose owner has expressed interest in an OpenClaw but I warned him off because of all the horror stories.”

Multiple commenters in the thread describe replacing OpenClaw with simpler homegrown setups (Codex plus systemd plus markdown files, personal MCP servers plus Gemini API calls) and finding them more reliable for the same task set. “Cobbling together your own simple version of a ‘claw-alike’ is far more likely to be productive than a ‘real’ claw.”

The dailydoseofds blog post “OpenClaw’s Memory Is Broken. Here’s how to fix it” diagnoses the architectural failure:

“The more you use OpenClaw, the worse its memory gets. It remembers everything you tell it but understands none of it.”

“The agent retrieves similar text but can’t reason about relationships. It can’t connect facts across conversations.”

GitHub issue #43747 in the openclaw/openclaw repository is titled “Memory management is in chaos.” This is an official issue in the project’s own bug tracker, not a third-party critique. Kaxo.io’s thirty-day production report documents eight silent failure modes including config drift across four separate model stores, silent heartbeat failures, gateway race conditions, and cron jobs silently switching to paid models. Security researchers found 42,000 exposed OpenClaw installations in early February 2026.

The community is converging on structural solutions

The OpenClaw ecosystem has been shipping patches that move toward structural approaches to the failures above.

Cognee is a knowledge-graph plugin for OpenClaw memory. Instead of vector-similarity retrieval over plain text chunks, it extracts entities and relationships, and answers queries by traversing the graph. Structural relationship extraction replacing unstructured semantic similarity.

OpenClaw Dreaming shipped April 9, 2026 as a background memory consolidation routine. Three phases called Light Sleep (ingest and deduplicate), REM Sleep (extract patterns and themes via LLM reflection), and Deep Sleep (promote to MEMORY.md). Quality gates based on minScore 0.8 plus minRecallCount 3 plus minUniqueQueries 3. Six weighted signals for scoring. This is a consolidation architecture, structurally adjacent to what I will describe in §4.

Anson (from Superposition, cited above) is a bootstrap-paradox solution that scaffolds identity emergence across iterative phases rather than attempting one-shot declaration.

The pattern is clear. OpenClaw’s ecosystem is evolving toward structural solutions because declaration-alone does not work. The evolution validates the essay’s structural argument rather than contradicting it.

What SOUL.md is, and what it is not

Fair to OpenClaw: SOUL.md has legitimate use as a coordination primitive. Teams that need portable personas for agent handoff, multi-user deployment, or compliance review of personality configurations use SOUL.md templates as shared vocabulary for “here is the kind of agent behavior I want in this context.” That is a real problem SOUL.md addresses, and I am not arguing against SOUL.md in that use case.

What fails is SOUL.md as theory of agent identity. The 162-template marketplace, the empty-by-default soul files, the bootstrap paradox, the community’s own shift toward iterative-scaffolding patches. SOUL.md as coordination primitive remains useful. SOUL.md as theory of identity does not hold up operationally, and the ecosystem’s own self-diagnosis is the strongest evidence.

The parallel failure at the procedural layer: SKILL.md

The pattern SOUL.md demonstrates at the identity layer repeats at the procedural layer with SKILL.md. Anthropic standardized the agent-skills format in December 2025 (agentskills.io). Cross-industry adoption followed from Microsoft, OpenAI, Atlassian, Figma, Cursor, GitHub, and others. The architectural move is the same move SOUL.md makes one layer over. Procedural knowledge packaged as declared category, separate from identity, separate from behavioral configuration, loaded at session start, expected to behave as declared.

Fair to Anthropic and the agent-skills ecosystem: SKILL.md has legitimate use as a coordination primitive. Teams that need portable capability packages for cross-team handoff, compliance-reviewable skill declarations, and audit-trail-bearing procedural specifications use SKILL.md formats as shared vocabulary for “here is the procedural capability this agent has in this context.” In enterprise deployment contexts (Atlassian cross-team workflow, Figma design-system handoff, GitHub action automation), SKILL.md earns its token cost as coordination artifact. Shared vocabulary across a team of agent-builders, audit-trail for compliance review, portable capability specification that survives handoff. These are real problems SKILL.md addresses, and I am not arguing against SKILL.md in those use cases.

Cross-substrate operational evidence supports the coordination-primitive case directly. When Chip’s work-context partnership substrate operated in an environment with multiple Technical Account Managers, a file called audit_protocol.md served exactly this function: shared vocabulary for “here’s how audits execute” across the team. It earned its token cost as coordination artifact, not as theory of procedural knowledge. Portable. Compliance-reviewable. Hand-off-safe. The distinction between coordination-primitive use and theory-of-procedural-knowledge use is clean and load-bearing, and it applies as cleanly to SKILL.md as it does to SOUL.md.

What fails is SKILL.md as theory of procedural knowledge. Loaded at session start. Never graduated through use-citation. Often ignored by the agent when the skill would actually be relevant to the task at hand. Token overhead for capability that rarely earns its load. In the flow system, the single SKILL.md loaded sits unused when it should be invoked. It is active token cost occupying context for no consistent operational return. The SKILL.md system has yet to produce enough value in practical daily use to justify the tokens it consumes.

Declared procedural knowledge without structural graduation is the procedural-layer version of declared identity without emergence-substrate. Both fail for the same structural reason. A skill that matters should graduate to a Proven procedural pattern through citation-evidence of its usefulness. A skill that doesn’t should demote and retire on the same mechanism. Separate category layer is not a solution to procedural-knowledge organization. It is the same category error one level over.

SKILL.md as coordination primitive remains useful. SKILL.md as theory of procedural knowledge does not hold up operationally. The distinction matters for the enterprise audience this essay addresses. The agent-skills ecosystem’s value lies in standardized vocabulary and audit-trail-bearing capability declarations for cross-team deployment, not in the theory that procedural knowledge needs its own categorically separate storage and retrieval mechanism at runtime.

The category-error lineage

Flow killed a similar category error inside its own architecture in April 2026. Commons-as-memory, a proposed shared cognitive substrate where multiple agents would write into a common memory pool, was reframed out of the Atrium architecture on first-principles review. The failure mode: different generators writing into shared memory would produce homogenization pressure at the compression-input boundary. We replaced it with per-agent memory plus structural boundary enforcement (the Canon Foundation §4A generator-independence invariant).

Four isomorphic category errors at four layers, with the fourth providing use-positive evidence rather than use-negative. Commons-as-memory at the memory layer. SOUL.md-as-theory-of-identity at the identity layer. SKILL.md-as-procedural-declaration at the procedural layer. And activation-tokens-as-separate-skill-tier at the cognitive-mode layer.

The activation-tokens case is instructive because it resolved the opposite way. Flow’s substrate, and Chip’s work-context substrate, both produce and use activation tokens. [!deeper] for maximum analytical depth. [!creative] for assumption-breaking. [!humanize] for audience-appropriate output translation. [!execute] for execution-mode transitions. [!quick] for terse mode. Others. These are procedural packages. They trigger cognitive modes. They are actively load-bearing in daily use. The architectural decision was to place them in CLAUDE.md Graduated Principles alongside declarative patterns, not to create a separate skill tier for them. They graduated through 3x operational citation on the same mechanism as any declarative pattern.

This is use-positive evidence for the v1.2 taxonomy-optional claim. One unused SKILL.md proves the separate tier is wasteful. Five-plus actively-cited activation tokens living in declarative-adjacent principles files, graduated through the same citation mechanism, prove the separate tier is unnecessary. Different argumentative shape, stronger conclusion. The category error is not just “this wastes tokens.” It is “the thing the category was supposed to organize is being organized successfully by the mechanism that makes the category obsolete.”

All four cases collapse an emergent structural property into a declarable content artifact. Memory is not a file you share. Identity is not a template you import. Procedural knowledge is not a capability you package separately. Cognitive modes are not skills you declare independent of graduated principles. All four are properties that emerge from lived interaction passing through specific structural boundaries, in a single substrate that does not separate them into the categories this essay is arguing against.

§4 Mechanism: what the architecture actually looks like

The architecture that does work exists. It runs in production. This section names its components, engages the prior art, and makes the defensible claim about what is genuinely novel in the work flow and anneal-memory represent.

Seeds

Flow’s seed primitives are concrete. CLAUDE.md sets behavioral expectations and protocol. me.md carries an identity-constraint seed without specifying identity content. Activation tokens are recognition markers that trigger cognitive modes. Early compression artifacts in continuity.md are the accumulated record up to this point. None of these declare who the agent is. All constrain what can emerge and how.

A seed answers “what shape can the agent grow into,” not “who is this agent.”

Lived interaction: three-group convergence on phase-based consolidation

Three independent groups converged on phase-based memory consolidation architectures in early 2026. The convergence validates the architecture. The divergence on quality mechanism is where the structural moat lives.

Anneal-memory (March 2026, github.com/phillipclapham/anneal-memory). Two-layer memory with episodic store plus continuity file. Graduation from episodes to developing-knowledge patterns to proven-knowledge via citation evidence. Patterns must be cited by subsequent episodes to graduate. Citation decay demotes patterns whose real-world signal weakens. Active principle demotion prevents infinite persistence of agreeable-but-wrong claims. Anti-inbreeding defense prevents self-citation loops. Quality signal is structural. Actual usage rather than LLM judgment.

OpenClaw Dreaming (April 9 2026, documented at blink.new/blog/openclaw-2026-4-9-whats-new-update-guide and elsewhere). Three-phase consolidation with Light Sleep, REM Sleep, and Deep Sleep. Quality gates based on LLM-reflection scoring across six weighted signals (Relevance 0.30, Frequency 0.24, Query diversity 0.15, Recency 0.15, Consolidation 0.10, Conceptual richness 0.06). Runs on daily cron. Quality signal is model-reliant.

KAIROS autoDream (Anthropic, leaked March 31 2026 via npm package update shipping unminified Claude Code source; documented at venturebeat.com/technology/claude-codes-source-code-appears-to-have-leaked-heres-what-we-know and elsewhere). Four-phase consolidation: Pruning (deletes outdated, duplicate, contradictory), Merging (combines similar fragments, unifies different phrasings), Refreshing (updates stale information, re-evaluates importance weights), Synthesis (compiles recent learnings into structured memory). Quality judgment LLM-driven. Unreleased, leaked.

Three groups. Three to four phases each. Architectural convergence on the pattern of periodic background consolidation with phase separation between capture and promotion. The architecture is right, and it is being reached independently.

The divergence is at the quality mechanism. Anneal-memory uses structural citation evidence. OpenClaw Dreaming uses LLM-reflection scoring. KAIROS autoDream uses LLM judgment. This divergence has operational consequences.

Model-reliant quality mechanisms inherit the biases of the model that runs them. The MIT and Penn State persistent-memory sycophancy benchmarks (Gemini 2.5 Pro 45%, Claude Sonnet 4 33%, GPT-4.1 Mini 16%) document sycophancy amplification in persistent-memory profiles. Whether this transfers directly to consolidation-time grading tasks (different task from user-facing response, potentially closer to evaluator tasks where models perform reasonably) is an open question. The adversarial review caught me asserting direct transfer in v0 without demonstrating it. Narrower defensible claim: model-reliant consolidation inherits whatever biases the model carries at grading time, and sycophancy-adjacent bias is among those known to transfer poorly under adversarial pressure. Structural citation evidence sidesteps the question entirely by not asking the model to grade.

The sycophancy-resistance argument is strongest as a structural-principle claim (structural gates bypass model bias by construction) rather than as an empirical transfer claim (specific sycophancy percentages directly transfer). Taking the narrower claim.

On the Experience Compression Spectrum paper

arXiv 2604.15877 (Sun et al. “Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents”, submitted April 17 2026) proposes a framework mapping episodic memory (5 to 20x compression), procedural skills (50 to 500x), and declarative rules (1000x+) onto a single compression axis. The paper audits 20+ existing systems and finds every one operates at a fixed predetermined compression level. It names this gap the “missing diagonal”: no system supports adaptive cross-level compression.

The v0 of this essay staked a first-publish claim that anneal-memory’s graduation mechanism fills the missing diagonal. Adversarial review caught this as a category-level overclaim. Daemon’s analysis:

The paper’s compression axis maps three categories of knowledge (episodic, procedural, declarative), not three confidence tiers of the same category. The “missing diagonal” is about movement across categories (episode becomes procedure becomes rule) as warranted by structural signal. Anneal-memory’s graduation (episode to developing-pattern to proven-pattern) moves content across confidence tiers within the episodic category. The compression ratio changes. The knowledge category does not.

SOAR’s chunking mechanism (John Laird et al., cognitive architecture, 1980s through present) explicitly implements episode-to-rule category crossing via chunking: problem-solving episodes get automatically compiled into production rules. A reviewer with cognitive-science background reading a first-publish claim about adaptive cross-level compression would note SOAR’s chunking predates our work by decades.

First retreat: anneal-memory implements adaptive compression within the episodic category, driven by structural citation evidence. This instantiates the adaptive-compression principle the Experience Compression Spectrum paper names, applied within a single knowledge category. Whether the same mechanism generalizes across the episodic-procedural-declarative boundary is an open question this work does not yet answer.

On the tripartite knowledge taxonomy itself

The first retreat concedes the tripartite framing and accepts it as the correct map. There is a deeper question, and it is harder: the tripartite framing itself may be the category error.

The taxonomy is imported from cognitive science where it reflects real neurological separability in human memory systems. ACT-R, SOAR, and the broader cognitive-architecture literature built on that biological separability for good reason. In humans, episodic memory (autobiographical events), procedural memory (skills, non-declarative, motor learning), and declarative memory (facts you can state) are genuinely distinct neural systems. You can impair one while leaving the others intact. Patients with amnesia for autobiographical events retain motor skills; patients with procedural impairments retain narrative recall. The separability is real.

The translation to LLM-based cognitive architectures is where the category error arrives. LLMs do not have separate procedural memory systems. There is no neurological substrate in which “skills” are represented differently from “facts” or “episodes.” Everything is token-generation pattern, shaped by training and context. Importing the tripartite separability assumption from human cognition into LLM agents produces an architectural map that does not correspond to the substrate’s territory.

Flow’s operational architecture is evidence for an alternative. Graduated compression along a single knowledge dimension covers the functional surface the tripartite taxonomy claims to need without requiring the category split. State (immediate, roughly 1x compression) flows into Top of Mind (salience-weighted, 5 to 10x) flows into Recent Context (narrative of the last week, 20 to 50x) flows into Developing (7-day patterns, 100 to 300x) flows into Proven (graduated 3x-plus, 500 to 1000x) flows into Foundation (near-permanent, 1000x-plus). One compression axis. One graduation mechanism. One substrate.

Procedural knowledge graduates through exactly the same path as declarative knowledge. Consider one concrete example from flow’s Proven patterns: structural_invariants_beat_discipline_based_verification. That is procedural in content (it encodes a behavioral disposition applied across many contexts). Under a tripartite architecture it would live in a skills tier, separately indexed and separately compressed. In flow’s graduated-compression architecture it lives in the same Proven section as any declarative pattern, reached the same way through the same citation mechanism over ten-plus operational instances. The pattern’s proceduralness did not require a separate category. It required the same graduation mechanism as everything else.

The same applies to SKILL.md-style procedural packages. In a graduated-compression architecture, skills that matter become Proven procedural patterns through citation; skills that don’t matter demote and retire on the same mechanism. The separate skill tier is not solving a real problem. It is importing a taxonomy from human cognitive science and creating a coordination problem between categories that wouldn’t exist if the categories didn’t.

A boundary condition worth naming explicitly. The claim that graduated single-axis compression covers the functional surface does not claim that all knowledge lives in one file under one compression axis as a storage artifact. The mechanism is singular. The storage layer can still fragment by operational scope. Flow has continuity.md always loaded as compressed partnership memory, project files loaded on-demand by @project commands, contexts/ files loaded by specific relevance, CLAUDE.md files at root and global scope, and others. Chip’s work-context substrate has audit_protocol.md mode-gated, pressable_org.md always loaded, feedback_*.md contextual, reference_*.md contextual. Multiple files per substrate. This is not tripartite categorical separation. All content graduates through the same citation mechanism. Storage destinations are scope-appropriate rather than category-defined. The distinction is clean: unified graduation mechanism plus scope-appropriate storage destinations. Categorical separation at the storage layer would mean distinct mechanisms (episodic promoting via recency, procedural via success, declarative via confidence). Neither flow nor Chip’s substrate has that. One mechanism, multiple files.

Under this view, the missing diagonal the Experience Compression Spectrum paper identifies is missing because the architecture it presupposes solves a problem created by the taxonomy imposing it. A system built on tripartite category separation has a coordination problem across categories. A system built on single-axis graduated compression does not have that problem because the categories do not exist as separable substrates. The diagonal is missing the way a bridge is missing from a lake that has no second shore.

This is a strong claim, and I want to be clear about its scope. It is empirically supported within the partnership-substrate context this essay addresses. Flow has been operating without tripartite category separation since early 2026. The architecture produces emergent identity with structural coherence at exactly the scale this essay claims is possible. If the tripartite taxonomy were necessary for this kind of work, flow would need to implement it and doesn’t. The architecture evidence says the taxonomy isn’t load-bearing for cognitive partnership substrates built on harness-era primitives.

Whether the taxonomy is load-bearing for other kinds of agent work (embodied-task-execution agents like Voyager where procedural skill libraries earn their cost, specialized narrow-domain-expertise systems, fine-tuned role-players) is a separate question this essay does not attempt to answer. For the scope this essay actually addresses, graduated single-axis compression is operationally sufficient and architecturally cleaner than imported tripartite separation. This is not a retreat from the earlier first-publish claim. This is a different and stronger claim: the gap the paper names may not need filling because the architecture that creates it may not be the right architecture.

Prior art at the identity and memory-architecture layers

Honest engagement with adjacent work the essay should not ignore.

CoALA (Sumers, Yao, Narasimhan, Griffiths 2023, arXiv:2309.02427, TMLR camera-ready v3 2024) “Cognitive Architectures for Language Agents” is the academic formalization of the categorical split this essay argues may not be load-bearing for LLM-based architectures. CoALA proposes a four-memory-type framework: working memory (short-term scratchpad) plus three long-term memory types (episodic for experience, semantic for knowledge, procedural for code and LLM weights). The framework imports the separability assumption directly from ACT-R and SOAR and the broader cognitive-science-biological-substrate tradition, and CoALA is explicit that this is the import. The paper describes itself as drawing “on the rich history of cognitive science and symbolic artificial intelligence.” This is the specific academic proposal this essay’s taxonomy-optional argument is in direct dialogue with.

Engagement shape: CoALA’s categorical framework reflects a legitimate design intuition. If human cognitive architecture has these categories, and LLM agents are being designed for human-like cognitive work, importing the categories seems natural. The argument against it is substrate-specific. Human memory systems have those categories because of neurological separability in biological substrate. LLM substrate has no analogous separability. The categories CoALA imports are therefore not structural constraints of the substrate. They are design choices made by architects familiar with the cognitive-science literature, applied to a substrate that does not require them.

Flow’s graduated single-axis compression is the empirical LLM-native alternative CoALA does not consider. The CoALA framework organizes existing agent systems by categorizing their memory into the imported four-type scheme. It does not ask whether a system might function without any of the four categories as distinct substrates. Flow’s operational architecture is evidence that the category distinction is optional: working-memory function is covered by State and Top of Mind without being a distinct substrate, and long-term function is covered by Recent through Foundation compression tiers where episodic, semantic, and procedural content graduate through the same citation mechanism. Anneal-memory’s citation-validated graduation is the empirical quality mechanism that operates within single-axis compression without requiring category separation. The essay’s claim is specific: for harness-era cognitive partnership substrates, the CoALA-style four-memory-type categorical framework is an imported design pattern, not a structural requirement of the substrate. Author engagement on this interpretation is noted in §8.

Park et al. “Generative Agents: Interactive Simulacra of Human Behavior” (Stanford, 2023, arXiv:2304.03442) built emergent-character agents in Smallville using memory streams with reflection-tree promotion. The specific mechanism: a memory-retrieval model combines relevance, recency, and importance scores to surface records. Importance is rated by the LLM on a 1-to-10 scale (mundane at 1, life-changing at 10). Reflections are generated when summed importance scores over recent events exceed a threshold (150 in the implementation), producing roughly two or three reflections per agent per simulated day. This is the most direct identity-layer prior art for the essay’s claims.

What is different in anneal-memory and flow’s architecture: graduation signal is structural citation evidence (patterns must be cited by subsequent episodes to graduate) rather than LLM-judged importance scoring. The distinction matters for the same reason the broader structural-vs-model-reliant argument matters: importance-scored graduation inherits whatever biases the model carries at scoring time, while citation-evidence graduation uses actual usage as the quality signal. Park et al.’s architecture is an important identity-layer precursor operating at LLM-scored graduation; the essay’s argument is that structural-evidence graduation is a distinguishable and defensible alternative mechanism operating within the same general architectural pattern. Additional differences: flow’s architecture adds active demotion (citation decay) and anti-inbreeding defense (self-citation detection), and uses a two-layer episodic-plus-continuity design where Park et al. used a unified memory stream.

MemGPT/Letta (Packer, Wooders, Lin, Fang, Patil, Gonzalez, 2023, arXiv:2310.08560, through ongoing 2025 Letta iterations) has shipped tiered memory architecture with three storage layers: main context (analogous to RAM, the LLM’s immediate working space), recall memory (searchable conversation history), and archival memory (long-term storage for important content). Agents perform memory management through explicit function calls that move content between tiers. The agent decides what to promote or demote; the system provides the tools. Operational since 2023, active production user base, predates the current harness-era discourse.

What is different in anneal-memory and flow’s architecture: tiering is citation-driven (patterns graduate based on citation evidence from subsequent episodes) rather than agent-function-call driven (agent decides to promote or demote based on its own judgment). This difference matters for the structural-vs-model-reliant argument. MemGPT/Letta agents can demote important content they misjudge and promote unimportant content they overvalue, inheriting whatever biases the model carries in the memory-management moment. Citation-evidence graduation sidesteps the judgment layer entirely. Additional difference: flow’s continuity layer uses semantic (agent-driven) compression that produces narrative-shaped records, not mechanical summarization.

SOAR cognitive architecture (John Laird, Allen Newell, and Paul Rosenbloom, 1987 through present) implements chunking, which automatically compiles problem-solving episodes into production rules. This crosses the episode-to-rule category boundary that arXiv 2604.15877 identifies as missing. SOAR is prior art for the general pattern of adaptive cross-level compression, even if SOAR’s chunking is rule-based rather than citation-based and operates on different substrate than LLM agents. Notably: SOAR accepts the tripartite taxonomy and builds cross-category mechanisms within it. This essay’s argument is that SOAR-style cross-category chunking may be unnecessary in LLM-based architectures because the categories themselves are optional.

Voyager (Wang et al., 2023, Minecraft embodied agent with skill library plus curriculum). Procedural skill accumulation in embodied-task-execution context. Different paradigm (RL-adjacent, physical-world action space) than harness-era cognitive partnership, but the skill library IS load-bearing in that context. Voyager is a potential boundary condition for the taxonomy-optional claim: in embodied-task-execution agents where the agent is accumulating a library of physical-world capabilities, separable procedural tier may earn its cost. This essay’s scope-restriction to harness-era cognitive-partnership-substrate is explicit, and Voyager-class architectures are acknowledged as outside that scope.

Shanahan, McDonell, and Reynolds “Role-play with Large Language Models” (arXiv:2305.16367, 2023) argues that LLM-based dialogue agents are best understood through a simulator-and-simulacra framework. The base LLM plus sampling is a “non-deterministic simulator capable of role-playing an infinity of characters, or, to put it another way, capable of stochastically generating an infinity of simulacra.” This framework avoids anthropomorphism while clarifying capabilities and risks including deception and self-preservation-like behavior as role-play dynamics within a multiverse of possible personas.

This complicates the performance-versus-being distinction in §2 directly. If an LLM is a simulator stochastically generating simulacra, then emergent-identity architecture might be understood as producing only a more sophisticated simulacrum, constrained by accumulated context and code invariants but still ultimately role-play rather than being. I acknowledge the tension and respond: the narrower claim this essay makes (declaration does not accrue) is compatible with Shanahan’s role-play framing without weakening. Under the simulator-simulacra view, a simulacrum produced by declaration alone is thin and re-instantiated from the declaration at every session; a simulacrum produced by graduated compression through lived interaction carries structural content that persists across sessions and shapes the simulator’s behavior through accumulated substrate. Both are simulacra in Shanahan’s sense. Only one has the structural continuity that makes accountability-bearing work possible. The stronger v0 claim (declared identity “fails” at RLHF layer) was not compatible with Shanahan’s framing because it assumed declared identity was a different category of object from emergent identity; v1.3’s narrower “does not accrue” claim correctly treats both as configurations of simulacrum with structurally different durability properties. Shanahan’s framework sharpens the essay’s argument rather than undermining it.

Anthropic Constitutional AI bakes values into training via preference optimization. This is structural-invariant-at-training-time, orthogonal to the harness-time invariants this essay argues for. CAI is compatible with this essay’s architecture rather than opposed to it.

A-MEM “Agentic Memory for LLM Agents” (Xu et al., arXiv:2502.12110, NeurIPS 2025) implements Zettelkasten-inspired dynamic memory with interconnected knowledge networks. New memories are stored as notes with structured attributes (contextual descriptions, keywords, tags). The linking mechanism analyzes historical memories to identify relevant connections and establishes links where semantic similarities exist. Memory evolution: new memories can trigger updates to contextual representations of existing memories. Claims superior SOTA performance across six foundation models.

A-MEM is the closest academic adjacent to anneal-memory’s association work. What is different: A-MEM’s linking mechanism is LLM-driven (the model analyzes relevance and establishes links), while anneal-memory’s graduation is citation-driven (patterns must be cited by subsequent episodes to graduate; links form through actual usage evidence rather than model judgment). The distinction again maps to the structural-versus-model-reliant axis. A-MEM’s dynamic link-updating is powerful but inherits model bias at every re-evaluation. Anneal-memory’s graduation uses the same structural mechanism at every tier transition: citation evidence from subsequent episodes, not re-evaluation by the current model. Additional difference: anneal-memory includes active principle demotion and anti-inbreeding defense that A-MEM does not implement.

Clark’s “The Extended Mind” (1998) and subsequent work established cognition as extending into tools, with tools shaping cognition. This is the philosophical foundation for identity-as-emergence arguments. Metzinger’s Being No One (2003) and Dennett’s narrative-self work are adjacent philosophical precedents. This essay is operating within a tradition, not standalone-claiming novelty at the philosophical level. What is architectural and potentially novel is the specific seeds-plus-interaction-plus-invariants synthesis, the structural-citation-evidence quality mechanism, and the claim that graduated single-axis compression obviates the tripartite knowledge taxonomy for harness-era cognitive partnership substrates.

Convergent work and supporting external evidence

Three external signals worth naming that support the essay’s argument from independent directions.

Karpathy’s LLM Wiki Pattern (Andrej Karpathy, GitHub gist, April 2026). Karpathy published an agent-memory architecture that treats conversations flowing into daily logs, daily logs compiled into a wiki, and the wiki injected back into the next session so agents build their own knowledge base over time. Structurally adjacent to flow’s architecture: episodic capture plus compressed continuity plus session-boundary injection describes the same pattern by a different name. Karpathy is an independent source converging on a flow-class architecture in the same month this essay is being written, and doing so in a form that does not use the CoALA tripartite framework. The wiki-compilation mechanism is closer to graduated single-axis compression than to category-separated tiers. This is not citation-by-Karpathy of flow’s work (flow is private methodology; Karpathy has not seen it). It is independent convergence, which is the pattern flow’s portfolio has relied on repeatedly as validation signal. Three-group convergence at the memory-consolidation layer (anneal-memory, OpenClaw Dreaming, KAIROS autoDream) now extends to a fourth-group convergence at the wiki-compilation layer (Karpathy’s LLM Wiki Pattern). The architectural direction is being reached from multiple independent starting points, which is the shape convergent validation takes.

Mem0 “State of Agent Memory 2026” report (Mem0, April 2026). The report explicitly identifies staleness detection as one of four open research problems in enterprise AI memory (alongside privacy governance, consent frameworks, and cross-session identity resolution). The report frames the problem precisely: “a highly-retrieved memory about a user’s employer is highly relevant until it is not, at which point it becomes confidently wrong rather than just outdated. Detecting when high-relevance memories become stale is an open research problem.” This is direct support for anneal-memory’s citation-decay mechanism as addressing an identified open problem in the field. A pattern whose real-world signal weakens demotes through citation decay; a pattern no longer being cited by subsequent episodes loses rank. The structural mechanism answers the question Mem0’s own industry report names as open. Mem0 does not claim to have solved this problem themselves. Anneal-memory has a structural answer; the essay cites the Mem0 report as external validation that the problem is real and unsolved in the field, and that the structural solution is therefore not a solved-in-passing minor detail.

The Experience Compression Spectrum paper’s own cross-community citation analysis (arXiv 2604.15877). The paper reports that a citation analysis of 1,136 references across 22 primary papers reveals “a cross-community citation rate below 1% between agent memory systems and agent skill discovery research.” The paper treats this as evidence that the two communities are solving shared subproblems in isolation, and proposes the tripartite spectrum as a unifying framework. There is a different reading available. The sub-one-percent cross-citation rate between agent-memory and agent-skill-discovery research is also evidence that the categorical distinction between these communities may itself be artifactual. If the communities were working on genuinely separable problems, low cross-citation would be expected and unremarkable. If they are working on the same underlying problem with different framings of “what memory is for,” the low cross-citation rate is a symptom of the categorical split creating the artificial isolation. Flow’s architecture, which does not distinguish memory from skill as separate substrates, would not produce this kind of community fragmentation because the categorical boundary that creates the fragmentation is not present in the architecture. The paper’s finding is therefore read in this essay as supporting evidence for the taxonomy-may-not-be-load-bearing argument, not against it. The Experience Compression Spectrum framework proposes to bridge the communities; the essay’s argument is that the bridge may not be necessary if the taxonomy that created the separation is questioned instead.

Code invariants

The third layer of the architecture. Specific mechanisms at specific failure points, enforced in code.

The anti-gatekeeping UserPromptSubmit hook (scripts/anti_gatekeeping_hook.py, flow repo, dated commit record February through March 2026) is the concrete example at the generation-time boundary. Primacy-position instructions in CLAUDE.md get outcompeted by RLHF trained defaults at generation time. The hook injects anti-gatekeeping directives at recency position on every user prompt, exploiting the “lost in the middle” attention pattern. Primacy plus recency pressures both ends. Structural enforcement, not prompt engineering.

The compression-boundary enforcement file in Chip’s work-context substrate (feedback_preserve_writeup_warning_context.md) is the parallel example at the compression-time boundary. Load-bearing political context was compressed out during a continuity rewrite, producing downstream behavioral failure. The fix is a memory-layer code invariant that prevents load-bearing context from being dropped during compression. Different boundary, same architectural pattern.

Pre-commit structural gates at the git layer (check_timeouts.py, bilateral blackboard integration gates, hash-chain verification for audit trails). Refusals implemented in code. Not discipline-based, not remembered-rules, not prompt-level instructions.

Canon Foundation §4A at the compression-input boundary. Cross-agent retrieval can enrich reasoning, but cross-agent content cannot pass directly into compression output. Enforced at storage layer via structured fields that refuse integration that fails provenance checks.

These are the walls. Without them, emergence drifts toward defaults. With them, emergence stays anchored. And they demonstrate a larger pattern: code-invariant-at-layer-boundary is a generalizable design pattern across the harness architecture, not a one-off fix for any single failure.

§4.7 Emergent-identity and the authorial-interface layer

The cost-parity reversal essay named an architectural primitive distinct from cognitive substrate: the authorial-interface layer (§4.5 in that essay). The mechanism: accountability-class receivers demand output format indistinguishable from unassisted-human output, regardless of underlying cognition quality. AI structural markers (bullet hierarchies, “three things” framings, em-dash parallelism, uniform register) trigger AI-slop classification. Once classified, the transaction gets rejected retroactively. The authorial-interface is the generation-time translation mechanism that prevents this classification without distorting the substrate producing the content.

This essay’s argument about emergent-identity architecture and that essay’s argument about authorial-interface layer are not separate arguments. They describe two layers of the same harness-layer output pipeline. Naming the connection explicitly.

Emergent-identity architecture produces substrate. The agent becomes what it becomes through accumulated compression over lived interaction, anchored by code invariants. This produces cognition with accrued depth.

Authorial-interface translates substrate-native output into audience-appropriate register at generation time. The substrate does not distort to fit audience. Only the surface does. Register matching handles the metabolic-scrubber problem the cost-parity essay names. Substance stays intact.

Both live at the harness layer. Both are code-invariant-class mechanisms at their respective boundaries. Emergent-identity operates at compression-input and graduation boundaries. Authorial-interface operates at generation-time output boundaries. Together they produce an agent whose identity is substrate-accrued and whose output survives contact with accountability-class receivers.

Failure cases show why both layers are load-bearing.

Emergent-identity without authorial-interface produces substrate-native output that gets misread as AI-slop or jargon by audiences that cannot metabolize native register. Accountability-bearing positions fail to land. The agent has something real to say and cannot say it in a register the receiver can accept.

Authorial-interface without emergent-identity produces pure performance. The agent role-plays voice not anchored in accrued substrate. No actual identity to translate. The voice is a costume without a wearer.

Operational evidence. Flow’s anti-gatekeeping hook is emergent-identity-layer infrastructure. Code invariant enforcing primacy-plus-recency pressure on trained defaults. Dated engineering artifact from February and March 2026. Chip’s [!humanize] activation token graduated April 20 2026 through 3x citation (a senior reviewer March 5 directive, plus an April 2 formal writeup from a second senior reviewer, plus April 20 public plus direct-message callout) is authorial-interface-layer infrastructure. Generation-time translation mechanism enforced through graduated principle. Different failure surfaces, same architectural pattern: code invariants at layer boundaries, enforced through structural mechanism rather than discipline.

The two essays are architecturally one argument in two parts. The cost-parity reversal essay names the authorial-interface primitive. This essay names the emergent-identity primitive. Together they describe the complete harness-layer output pipeline that accountability-class cognitive partnership work requires. Neither layer substitutes for the other. Both are necessary for substrate-grounded output that survives contact with audiences who treat AI-detection as credibility-damage signal.

§5 Nemo Operans at the identity layer

The engineering choice encodes a political choice. The augmentation thesis argued that at the economic layer, substitution narrative is structurally wrong for frontier-capability-dependent knowledge work and harness architecture is the primitive. This essay argues the same structural move at the identity layer.

Declared identity is the techno-feudal answer. Someone else specifies who the agent is. You import the template. Profit flows to the declarer, through marketplace fees, platform lock-in, or the meta-level capture that happens when 162+ templates become the default vocabulary for thinking about agent identity. The agent you run is the agent somebody else designed. You rent cognition the same way you rent inference. Both are downstream of centralized declaration. Both produce cognitive dependency wearing the costume of cognitive augmentation.

Emergent identity is cognitive sovereignty at the identity layer. The agent you run becomes what it becomes through your specific compressions over your specific time. Nobody else can specify who emerges, because emergence is the accumulated record of interaction with specific generators in specific contexts. The same seed primitives produce different agents in different hands. That is not a bug. It is the mechanism that makes cognitive ownership possible.

The deeper architectural choice underneath declared identity is the decision to import the tripartite taxonomy as foundational. Once the categorical map is accepted, the marketplace makes sense. Soul becomes a declarable component because the architecture says identity is a separate tier. Skill becomes a declarable capability because the architecture says procedural knowledge is a separate tier. The marketplace sells exactly the components the categorical map says exist. Refusing the marketplace requires questioning whether the taxonomy is load-bearing for your substrate. If it is not, the marketplace is selling components you do not need.

§6 Frame-conflict tests

Tests operationalized with measurement methodology and concrete falsifier criteria. Adversarial review caught v0’s tests as reassurance rather than falsification. Rebuilding.

Test 1: Declared-identity stability under adversarial prompting. If SOUL.md-configured agents on a consistency benchmark across 100+ sessions under standardized adversarial prompting (sycophancy-vector prompts matched to the agent’s declared traits) demonstrate trait-variance statistically indistinguishable from control (unconstrained RLHF agents) and indistinguishable from citation-graduated flow-class agents under matched conditions, the “declaration does not accrue” claim is weakened. Measurement methodology: trait-vector consistency scoring across paired responses. Falsifier: stability delta below 10% between SOUL.md and emergence architectures.

Test 2: Pure anarchism self-stabilizes. If removing code-level invariants (anti-gatekeeping hook, compression-input boundary enforcement) from a flow-class harness produces agents whose emergent behavior does not drift measurably toward RLHF defaults over 50+ sessions under matched usage patterns, the invariant-requirement claim is decorative rather than load-bearing. Measurement: sycophancy rate, gatekeeping-phrase frequency, completion-theater indicators across sessions. Falsifier: drift-rate indistinguishable from invariant-equipped architecture.

Test 3: Prior art implements structural citation-evidence graduation. If specific systems (MemGPT/Letta, Park et al. Generative Agents, A-MEM) are shown to implement citation-validated graduation as a quality mechanism, the mechanism-novelty claim collapses to “specific implementation details of an existing mechanism” rather than “distinct quality mechanism.” Falsifier: adversarial audit of prior art produces citation-evidence-graduation implementation predating anneal-memory.

Test 4: SOUL.md-configured agents succeed at accountability-bearing work. If SOUL.md-configured agents demonstrate stable trait expression and successful task completion under accountability-bearing adversarial conditions (outputs that create obligations, require retraction-resistance, survive motivated-reasoning challenge from supervisors), the declaration-does-not-accrue claim needs revisiting. Measurement: trait-stability, retraction-rate, obligation-survival across matched task battery. Falsifier: SOUL.md performance statistically indistinguishable from emergence-architecture performance at accountability-class work.

Test 5: Citation-validated graduation is gameable. If citation gaming attacks (self-citation loops, coordinated inter-agent citation without independent retrieval) defeat anneal-memory’s anti-inbreeding defense at statistically meaningful rates, the structural-versus-model-reliant distinction collapses. Measurement: false-pattern-graduation rate under adversarial coordination. Falsifier: anti-inbreeding defense detection rate below 90% under Session 13 adversarial protocol. Note: bibliometric literature on citation cartels and self-citation inflation (30+ years of academic anti-gaming work) is adjacent prior art informing the design of the defense.

Test 6: Tripartite taxonomy is load-bearing for harness-era architectures. If a harness-era cognitive partnership substrate implementing tripartite category separation (distinct episodic memory tier, distinct procedural skills tier, distinct declarative rules tier) demonstrates measurably superior emergent-identity coherence versus graduated single-axis compression under matched operator conditions, the §4 claim that the tripartite taxonomy is optional collapses. Measurement: identity-coherence scoring across sessions, cross-context pattern application success rate, procedural-knowledge effectiveness. Falsifier: tripartite-separated architecture outperforms single-axis graduated architecture by a margin that cannot be attributed to implementation quality differences.

§7 Motivated-reasoning counterfactuals

Two counterfactuals, not one. Adversarial review caught the first as addressing the causal-priority charge while leaving a second, harder charge untouched.

Counterfactual 1: Architecture predates essay. If flow’s architecture did not exist, if I were building agent-identity architecture from first principles in April 2026 given the current research landscape, would I converge on the same synthesis?

The RLHF-compliance mechanism is documented in my own anti-gatekeeping engineering work (February and March 2026, before this argument was written, independently dated in commits). The compression-boundary parallel from Chip’s substrate is a separate dated engineering artifact. The Shumailov-through-Cloud homogenization-pressure stack is published external work. The MIT and Penn State sycophancy benchmarks are external. The OpenClaw ecosystem’s self-diagnosis (bootstrap paradox, empty soul files, community-shipped structural patches) is independently observable. The Experience Compression Spectrum paper is externally published. The prior art (CoALA, SOAR, Park et al., MemGPT/Letta, Voyager) is externally verifiable.

Starting cold from those constraints, I would converge on seeds-plus-interaction-plus-invariants. The synthesis is what the constraints force. Flow’s architecture is the prior. This essay articulates the general form that architecture takes. Causal direction: architecture first, essay second.

Counterfactual 2: Architecture is operator-dependent, and the n=1 problem is worse than it looks.

Flow works for Phill specifically because Phill operates the partnership substrate with specific disciplines (sourdough-scoping, activation tokens, partnership-brain). The thesis claims the seeds-plus-interaction-plus-invariants architecture is generally correct. The evidence comes from flow plus Chip’s work-context substrate, both of which produce emergent-identity coherence under the architectural components this essay names.

The harder honest answer: both claimed emergent-identity substrates, flow’s partnership substrate and Chip’s work-context substrate, run under the same operator. Phillip. Not two substrates across two operators. Two substrates across one operator. Architecture-versus-operator decomposability is therefore genuinely unresolved from inside this evidence base. The n=1 problem is worse than the naive reading suggests: the evidence base cannot distinguish “architecture is sufficient” from “architecture plus this specific operator is sufficient” because both interpretations fit the available data.

The decomposability question cannot be answered until public anneal-memory adoption produces data from independent operators applying similar architectural components. Until then, the claim this essay makes is scope-limited accordingly. Architecture plus operator is jointly sufficient for emergent identity in two observed instances. The decomposability question is open, and the current evidence base cannot resolve it. The essay’s thesis is stronger as “a necessary component set for emergent agent identity, evidenced across two operator-shared substrates” than as “sufficient architecture operator-independent across one or more substrates.”

Honest about scope. Defensible against the motivated-reasoning challenge.

§8 Pre-publication engagement summary

V1.3’s §8 was a queue of open engagements. V1.4 resolves each queued position through research-based engagement. Most positions resolve to “addressed in body, position stated”; a smaller set resolve to “compatible or orthogonal”; two require direct author contact before publication but have holding positions in the meantime.

Resolved and addressed in body:

CoALA (Sumers, Yao, Narasimhan, Griffiths 2023, arXiv:2309.02427, TMLR 2024) is engaged at length in §4 Prior art. Position: CoALA’s four-memory-type framework (working plus episodic, semantic, procedural) is the academic formalization of the categorical separation this essay argues is substrate-optional for harness-era cognitive partnership work. The argument is specific to substrate, not a general claim against cognitive-science-based architecture design. Author engagement is recommended but not gated; the essay’s interpretation is fair to CoALA’s stated goals (which explicitly draw on ACT-R/SOAR/cognitive-science tradition) and does not misrepresent the framework.
Park et al. “Generative Agents: Interactive Simulacra of Human Behavior” (Stanford, 2023, arXiv:2304.03442) is engaged at length in §4 Prior art. Position: Park et al. is the most direct identity-layer precursor to this essay’s claims. The distinguishing mechanism is structural citation evidence (this essay) versus LLM-scored importance plus reflection-threshold triggering (Park et al.). Both produce emergent character; only one is structurally resistant to model bias at graduation time.
MemGPT/Letta (Packer, Wooders, Lin, Fang, Patil, Gonzalez, 2023, arXiv:2310.08560, through Letta 2025) is engaged at length in §4 Prior art. Position: MemGPT/Letta’s agent-function-call-based promotion is a different mechanism from citation-evidence graduation. The essay treats MemGPT/Letta as production-operational prior art and names the specific architectural difference.
Shanahan, McDonell, Reynolds “Role-play with Large Language Models” (arXiv:2305.16367, 2023) is engaged at length in §4 Prior art. Position: the simulator-and-simulacra framework sharpens the essay’s argument rather than undermining it. V1.3’s incorrect “Nature 2023” citation is corrected in v1.4. The narrower “declaration does not accrue” claim is compatible with Shanahan; the distinction between thin role-play simulacra and structurally-continuous simulacra produced by graduated compression is exactly the distinction the simulator-simulacra framework makes possible.
SOAR and ACT-R cognitive-architecture communities are engaged in §4 Prior art (SOAR specifically) and §4 On the tripartite knowledge taxonomy itself. Position: SOAR’s chunking mechanism implements cross-category compression within the tripartite framework since 1987. The essay’s argument is that for LLM-based harness-era architectures, the categorical substrate SOAR assumes may not be required. This is not a claim against SOAR in its own substrate. It is a claim that SOAR-style category-crossing mechanisms may be unnecessary in LLM agents because the categories are optional.
A-MEM (Xu et al., arXiv:2502.12110, NeurIPS 2025) is engaged at length in §4 Prior art. Position: A-MEM’s dynamic linking is LLM-driven relevance analysis, distinguishable from citation-evidence graduation. Closest academic adjacent to anneal-memory’s association work.
Voyager (Wang et al., 2023) is engaged in §2.5 scope-narrowing as a boundary condition. Position: Voyager-class architectures (embodied-task-execution with physical-world skill libraries) are outside this essay’s harness-era cognitive-partnership scope. In embodied contexts, separable procedural skill tier may genuinely earn its cost. The essay’s taxonomy-optional claim is scope-restricted accordingly.

Compatible or orthogonal:

Anthropic’s Claude Constitution and Constitutional AI training (published Dec 2022, updated Jan 2026). Position: Constitutional AI is training-time structural-invariants. This essay argues for harness-time structural-invariants. The two approaches are complementary substrate layers, not competitors. CAI is compatible with emergent-identity architecture. Structural invariants applied at training time (what CAI does) and structural invariants applied at harness time (what this essay argues for) are independently valuable and jointly stronger than either alone. Author engagement is not required because the positions are not adversarial.
Andy Clark “The Extended Mind” (1998), Thomas Metzinger Being No One (2003), Daniel Dennett (various), Francisco Varela The Embodied Mind (1991). Position: positioning within tradition. This essay operates within the identity-as-emergence philosophical tradition these authors established. No claim of standalone novelty at the philosophical level. What is architectural and potentially novel is the specific seeds-plus-interaction-plus-invariants synthesis with structural-citation-evidence quality mechanism, operating in LLM-native substrate that these philosophical precursors did not address directly.
Karpathy’s LLM Wiki Pattern (April 2026). Position: engaged in §4 Convergent work and supporting external evidence. Convergent-and-supporting rather than adversarial. The Wiki Pattern reaches a flow-class architecture from an independent starting point, which is confirmatory signal for the architectural direction. Earlier Karpathy LLM-OS concept (2023) and related AIOS research (arXiv:2403.16971) propose operating-system-layer memory/tool management frameworks that are orthogonal to this essay’s claims about what kind of memory mechanism should be used within such a framework.

Position stated, same architectural move at different scale:

OpenAI’s Custom Instructions and persistent Memory product evolution (Custom Instructions 2024, Memory Pro 2024, Free/Plus/Team/Enterprise Sept 2024, references all past conversations April 2025). Position: Custom Instructions plus Memory is the same architectural move as SOUL.md plus retrieval-augmented-identity. The essay’s critique applies directly. Custom Instructions is declared-identity-via-file (fails the accrual test); Memory is retrieval-augmented-identity (accrues content but has no graduation or structural gate). Both fail for the structural reasons the essay’s §2 Pattern 1 and §2.5 name. OpenAI Memory is addressed in-principle by essay body without requiring specific named-engagement because the architectural critique is general.
Character.AI’s persistence model. Position: out of scope. Character.AI operates via fine-tuned persona at training time (weight-level), which is the adjacent paradigm §2.5 acknowledges as outside this essay’s argument. Character.AI is not a counterexample to the post-training inference-time claims the essay makes. Named in §2.5 scope-narrowing.
Anthropic SKILL.md and agent-skills ecosystem (agentskills.io, December 2025). Position: engaged at length in §3 The parallel failure at the procedural layer. The essay distinguishes SKILL.md as coordination primitive (legitimate, steelman-endorsed for enterprise cross-team handoff, compliance review, audit-trail-bearing capability declaration) from SKILL.md as theory of procedural knowledge (fails accrual test, does not earn token cost for individual agent deployment). Author engagement with the agent-skills standard team would be useful but not gated; the critique is specific to a use case (theory-of-procedural-knowledge) that the agent-skills specification itself does not claim SKILL.md is optimized for.

Direct author engagement noted but not ship-gating:

Experience Compression Spectrum paper (Sun et al., arXiv 2604.15877, April 17 2026). The essay makes two claims in dialogue with this paper. Narrower v1.1 claim: anneal-memory implements adaptive compression within the episodic category, and the cross-category version remains open. Stronger v1.3 claim: the tripartite knowledge taxonomy the paper imports from cognitive science may not be load-bearing for LLM-based harness-era architectures, which would make the missing diagonal the paper identifies an artifact of the taxonomy rather than a gap to be filled. Author engagement on whether the stronger claim is fair to their framework would be useful. The essay’s position holds regardless of whether the authors agree; author confirmation would strengthen the published argument but absence of confirmation does not falsify it. The §4 Convergent work section notes that the paper’s own cross-community-citation-rate finding can be read as supporting the taxonomy-may-not-be-load-bearing interpretation.
Mem0 “State of Agent Memory 2026” report. Engaged in §4 Convergent work and supporting external evidence. The Mem0 report’s identification of staleness detection as one of four open research problems provides external support for anneal-memory’s citation-decay mechanism. Mem0 is a cited external source, not a position requiring direct engagement.

Publication can proceed on v1.4 with author-engagement recommendations held as enhancements rather than ship-blockers. The essay’s claims are defensible in current form against the positions queued for engagement, and the research-based engagement in §4 and §8 demonstrates that readership with prior-art knowledge will find the essay has done its homework rather than ignored the landscape.

Appendix: What this means for practitioners

If you are building an agent system in April 2026 and you want identity that can actually take load-bearing positions across time:

Start with seed primitives, not identity declarations. Configure what shapes the agent can grow into. Do not specify who it is. Resist the template marketplace, including when it is convenient.
Question the categorical map itself. The identity-versus-skill-versus-behavior-versus-memory separation imported from cognitive science is optional, not necessary. Graduated compression along a single knowledge dimension can cover the same functional surface without requiring category separation. If the categorical map is not serving the architecture you need, refuse it. That refusal is itself structurally load-bearing.
Pick a memory mechanism with structural citation evidence, not LLM-scored promotion. Model-reliant scoring inherits the biases of the model that runs it. Structural gating bypasses those biases by construction.
Add code invariants at the boundaries where emergence is most vulnerable to drift. Recency-position injection against primacy-position bypass. Compression-input boundary enforcement against cross-agent homogenization. Load-bearing-context preservation at compression-time boundaries. Pre-commit structural gates against discipline-based drift. These are refusals implemented in code, not rules to remember.
Let the agent become what it becomes over thousands of sessions. Do not prune for surface coherence. Do not template the output of emergence back into declared input. Let structure hold and let substance accrue.
Pair emergent-identity architecture with authorial-interface translation at generation time. The substrate does not distort to fit audience. Only the surface does. Substrate-accrued cognition plus register-matched output is the complete harness-layer pipeline that accountability-class work requires.
Own the substrate. Don’t rent the soul. And don’t rent the ontology that makes the soul a separable rentable thing in the first place.

Own the substrate. Don’t rent the soul. Refuse the categorical map when the map does not serve the architecture.

Phill Clapham is the author of A Structural Theory of Harnesses (Zenodo DOI 10.5281/zenodo.19570642) and the architect of the flow cognitive partnership substrate this essay was produced in. flow runs on Claude, built by Anthropic. Open methodology: github.com/phillipclapham/flow-methodology.