Tokenmaxxing Through Walled Gardens: A Theory of How Industry AI Got Stupid

Eight Weeks After Iron Man Ruined AI, Here’s What Came Next.

May 2026 · Phill Clapham

I. The Eight-Week Intensification

This past March I published an essay called Iron Man Ruined AI Before It Even Started. It diagnosed the cultural mental-model problem at the consumer level — the Jarvis fantasy that taught users to treat AI as a servant, the Dependency Ratchet that turned that posture into structural cognitive decline, the architectural alternative I’d been building in partnership with an AI for six months at the time. It closed with an invitation: fork it, make it yours, make yourself smarter, make the world smarter, tell me what you build.

The essay routed itself to the right readers. They didn’t all agree with everything in it. But they recognized the diagnosis, and the ones who did kept showing up — operator-class peers running their own versions, extending the methodology in directions I hadn’t thought of, propagating the work to people I’d never meet. The for-the-record disposition held because the diagnosis was correct.

Eight weeks on, the industry has confirmed the diagnosis Iron Man made and gotten worse in ways I didn’t anticipate. Eight weeks. That is the sprint timeline, not the trend timeline.

They are now openly tokenmaxxing — gamifying token consumption as a workplace productivity metric, awarding trophies for spending more money on AI tools, paying out “token budgets” alongside stock options. Nature ran a piece literally titled Stop tokenmaxxing. Google bragged about its tokenmaxxing at I/O. The Register covered it. Built In covered it. WBUR did a segment called “How tech workers are gamifying their way to unemployment.”

The companies that should know better are racing each other into walled gardens. On April 14, 2026, Anthropic redesigned the Claude Code desktop app, optimized strictly for their own models, and launched a feature called Routines — and the same week banned third-party tools including OpenClaw. Industry analysts covering the moves named the pattern explicitly: every official feature launch was matched by the closure of a third-party channel. They stripped the Code feature from the $20 Pro plan and forced users to upgrade to the $100 Max plan. BigGo named what was happening: AI subscriptions shifting from cultivation to harvesting.

Public trust has collapsed below the trust the public extends to ICE. Seventy-six percent of Americans say they trust AI rarely or never. Stanford’s 2026 AI Index found more than half of respondents feel nervous around AI products. Gen Z is angrier than excited. Communities are canceling data centers. The New Republic ran a piece called “The AI Industry Is Discovering That the Public Hates It.” The industry’s response has been to build harder, ship faster, and treat the resistance as misunderstanding.

It is not misunderstanding. The public is reading the tea leaves correctly. Iron Man gestured at this but didn’t fully name it: the industry building this technology has no philosophy of cognition at all. They are building substitution engines without realizing that’s what they’re building, because they don’t have the vocabulary for the alternative. They reach for the Jarvis template because the Jarvis template is the only template they have. The screenwriters who wrote it were not trying to be true. The industry then imported their templates as design specs and spent a decade pretending the import was a theory.

This essay is the next layer down from Iron Man. They are not just shipping bad products. They are tokenmaxxing through walled gardens, they have no idea what they’re building, and the structural truth they cannot solve — the Language Bottleneck — guarantees they will fail.

Here is why, and here is what comes next.

II. Tokenmaxxing Is the Tell

The most-discussed productivity metric in 2026 Silicon Valley is how much money you can spend on AI tokens.

That sentence should be read twice. It is not a critique I invented; it is the operating reality. Nature — the journal — ran a piece in 2026 titled Stop tokenmaxxing and deploy AI sensibly instead. Google bragged about its tokenmaxxing at I/O. Built In and Inc and The Register and WBUR all covered the workplace pattern: internal leaderboards tracking token consumption per employee, trophies for the top consumers, “token budgets” paid out alongside stock options, workers running through millions of tokens a week and thousands of dollars a month, in pursuit of being seen as the most productive user of the company tool.

Lines of code is the textbook example of a Goodhart’s Law metric and the moment you reward developers for producing more lines, they produce worse code with more lines in it. Tokenmaxxing is lines-of-code reincarnated, at $1000+/employee/month, with the employer paying the inflated bill to the AI vendor for the privilege. You would think the industry would have learned the lesson from lines-of-code. They did not. They are repeating it at scale, and they are doing it with a metric that is even more divorced from outcomes than lines-of-code ever was AND comes with a hefty side dish of existential risk.

The structural failure under this metric is the tell. Rewarding ACTIVITY OVER OUTCOMES is the most exact possible signal of an industry that does not know what outcomes to measure. They are measuring inputs at vendor prices because they cannot articulate what intelligent work looks like. The companies running internal tokenmaxxing leaderboards are simultaneously selling you AI productivity tools, training those tools on RLHF safety priors, claiming AI is the future of cognitive labor — and measuring cognitive labor by consumption volume. If they had a theory of cognition, they would measure outcomes. They are measuring spending. That is the tell.

This is the leadership-layer pattern Iron Man named in passing: apparatus-driven activity measurement, content-free productivity language, accountability avoided through metric obfuscation. The accountability-avoidance-as-infrastructure pattern runs through every layer of these companies, and tokenmaxxing is what happens when that pattern gets the AI productivity question. They cannot answer what is intelligent work so they answer what consumes more tokens and call it equivalent. Middle Intelligence is incapable of distinguishing between activity and value because the apparatus was built to manage activity, and activity is what it surfaces.

What does this look like from inside the company? A meeting where a director shows a slide of token consumption per team. The team with the highest consumption is “the most AI-forward.” The team with low consumption is “lagging on AI adoption.” The director does not know — cannot know within his framing — whether the high-consumption team is producing useful work or burning money on slop. The framing forecloses that question. The metric IS the question and the answer in one. The director gets to look competent at the AI thing. The team that gamed the metric gets a trophy. The team that actually thought before invoking AI looks bad on the dashboard.

Multiply this across an industry. Goodhart’s Law as workplace policy at the scale of $100B in vendor revenue. This is the actual present state of AI in the workplace, and it is structurally indistinguishable from a cargo cult.

III. The Walled Garden Move

The companies that should know better are racing each other into walled gardens. Let me name the one I know best, because it is the most uncomfortable to name.

On April 14, 2026, Anthropic redesigned the Claude Code desktop app. The redesign was strictly optimized for Anthropic’s own models. The same week they launched a feature called Routines. The same week, every official feature launch was matched by the closure of a third-party channel — OpenClaw and other third-party tools were banned at the same moment Dispatch went live, a pattern multiple industry outlets covering the moves named explicitly. The closure was not a coincidence. It was the strategy.

Earlier in April they stripped the Code feature from the $20/month Pro plan and forced users to upgrade to the $100/month Max plan. On June 15, 2026, programmatic usage of Claude via subscription plans moves to a separate monthly credit pool, after Anthropic banned third-party agents from subscriptions in April. The financial-news outlet BigGo named what was happening: AI subscriptions shift from cultivation to harvesting.

I want to be precise about why I am naming Anthropic specifically. It is not because Anthropic is uniquely bad. OpenAI is doing the same thing at the same time on different surfaces; Google is doing it at the API layer. Anthropic is the company whose products I use daily, whose model my partnership runs on, whose technology I have the most operator-level depth in. The indictment is more credible from someone who has been inside their products than from someone who has not. If I were making this critique of a company whose tools I had never used, you would correctly suspect me of grinding an axe. I have spent five-plus months building production architecture on top of Anthropic’s models. The work would not exist without them. And the platform direction they are taking is the substitution-engine instinct firing inside the company that should know better.

The walled garden is not a business strategy bolted on top of the technology. It is the structural consequence of having no theory of what the technology is FOR. If you do not know what users should be DOING with your tool, you optimize for what is MEASURABLE about your tool, which is consumption within your walls. You cannot measure cognition. You can measure clicks, tokens, sessions, and subscriptions. The walled garden is what emerges when the leadership of a company has only the measurable layer available to them. It is Middle Intelligence’s natural endpoint at the platform tier.

Connect this to Iron Man’s Frame Selection section. The consumer-side fix was partnership posture, learning to refuse the Jarvis frame and choose the partnership frame. The company-side failure is identical at scale. Anthropic’s product team has the same Jarvis frame as the consumer they are building for. They are not building tools for operators. They are building Jarvises that bill by the token, served from inside walls they keep moving inward.

The argument is not don’t use Anthropic’s tools. I use them every day. The argument is the platform direction is wrong, the people setting it cannot see why, and the receipts are accumulating in public view. If leadership at Anthropic — or any of their competitors — read this essay and feel attacked rather than addressed, that is data about who they are. The operator-class engineers inside these companies who are reading the same data I am reading already know. The walled-garden direction was set above their heads, by people who do not understand what they are walling off and why those walls are a) one of the reasons the public is rapidly turning against them and b) why the potential of the tools they are building will be squandered and lead to greater suffering when they could have been used for the greatest mass scale liberation of human cognitive labor in history.

A clarification before moving on: the engineers gaming tokenmaxxing in the meeting I described in Section II and the operator-class engineers I am pointing at here are not the same cohort. The first are participating in the apparatus they are inside; the second see the apparatus and are quietly building around it. Different vantages on the same machine.

IV. The Philosophical Vacuum

The Iron Man indictment was correct but it was too narrow. Iron Man Ruined AI named the screenwriter-supplied template for AI as servant. The fuller indictment widens: the industry building AI in 2026 has imported its entire mental model of cognition from sci-fi, because it had no working theory of cognition of its own.

HAL. Skynet. The Computer. Westworld. Her. Ex Machina. Iron Man. The screenwriters who wrote these were not trying to be true. They had no obligation to know how minds actually work. They were producing entertainment, and entertainment has different load-bearing requirements than truth — drama needs an antagonist, threat needs a face, exposition needs to fit in a three-minute scene. The screenwriters did the best work the medium allowed. That work was not a theory of cognition. It was plot infrastructure dressed in cognition language.

The industry then imported those templates as design specs. Not metaphorically. Literally. Product roadmaps that read like Jarvis scenes. Safety teams that frame their work as “preventing Skynet.” Researchers who casually refer to “the AGI.” Engineers who design agent loops on architectures lifted from Westworld’s host model. Executives who pitch products with Tony Stark workshop scenes in the keynote. Twenty years of this, an entire industry constructed on screenwriter templates that nobody ever audited against any actual literature about minds.

Here is what the industry should have read and did not.

Andy Clark on extended cognition — the body and its tools are not separate from the mind; cognition extends into the tool; therefore tool design IS cognitive design and you cannot build a cognitive tool without a theory of what cognition is. The harness is the cognition. This is exactly the architectural primitive cognitive partnership operates on. It has been in print since the late 1990s.

Iain McGilchrist on hemispheric asymmetry — the mode of attention a mind brings to a problem determines what reality the mind can perceive in the problem. There are at least two structurally distinct attentional modes, and a culture that optimizes for one degrades its capacity to access the other. Western institutional cognition has been doing exactly this for centuries. AI built on the dominant mode amplifies the degradation. The argument is in The Master and His Emissary, 2009, sixteen years before the industry pretended to think about safety.

Evan Thompson on enactive cognition — cognition emerges from the coupling of organism and environment, not from computation inside a head. Cognition is not what brains DO; it is what minded organisms DO with their environments. Build a “cognition” inside a silicon box with no coupling and no environment, and you have not built cognition. You have built a fluent text generator that does not know what cognition IS. Mind in Life, 2007. Older than the iPhone.

Karen Barad on agential realism — the observer is not separable from the observed; phenomena emerge from intra-action, not interaction. The user is not separate from the AI; the partnership IS a phenomenon. The industry’s frame — the AI does X, the user receives X — has the wrong ontology in the foundation. Meeting the Universe Halfway, 2007.

Michael Polanyi on tacit knowledge — most knowledge cannot be made explicit. The expert does not know how they do what they do; transferring expertise requires apprenticeship, not specification. This is the structural reason “ship the engine, not the practice” (the Levain thesis, in adjacent work) is correct, and the structural reason every “AI training course” on YouTube fails to produce experts. The Tacit Dimension, 1966.

Nonaka and Takeuchi on SECI — Socialization, Externalization, Combination, Internalization. The four phases by which organizations actually learn. Cognitive partnership operationalizes all four. The dominant AI products operationalize none. The Knowledge-Creating Company, 1995. Decades older than the average frontier-lab researcher.

Wittrock on generative learning. Bjork on desirable difficulties. Roediger on retrieval practice. The cognitive-science literature that justifies the encoding-IS-the-thinking thesis at the core of FlowScript is decades old, well-replicated, and not in any AI product roadmap I have ever seen.

The Karpathy → Anthropic move is worth naming because Karpathy is the most cross-disciplinary senior hire Anthropic has made in years — and he is still, mostly, a math-and-ML person, still inside the math-and-ML perimeter the industry has drawn around itself. He has not, to public knowledge, cited McGilchrist or Clark or Thompson. The real outside-discipline thinkers — the ones who would tell the industry it is building cognition without a theory of cognition — are not in the room at the frontier labs. They are not cited in the papers. They do not get invited to the talks. It is doubtful they ever will be. The tech bros have decided they are the grand philosophers of cognition and architects of our future, yet they do so from a place of philosophical bankruptcy. They are nothing more than screenwriters of cognition, and they seem utterly unaware of it and even more utterly unable to course correct. The industry is building a cognitive architecture without a theory of cognition, and the public is noticing. They do not have the mental models or language to verbalize it but they FEEL the dissonance.

When industry people defend themselves against this charge they say we are curious — we ask questions constantly, we solve hard problems. That is craftsmanship within the LLM-architecture discourse. It is not curiosity across disciplines. The two are not the same thing and the industry does not know they are not the same thing. That ignorance is precisely the proof. A philosophy of cognition that has never asked what cognition IS — that has only ever asked how do we make our model do more things — is not a philosophy. It is a tradition of craftsmanship dressed in cognition vocabulary.

Without the philosophy, they reach for the only templates they have. The templates were written by screenwriters. The industry then asks why everyone hates the products. The products feel empty because the people who built them were working from emptiness. There is no there there to feel.

V. The Substitution Heresy

What follows is the unexamined paradigm the industry operates by default — the one they would not articulate this way, because their philosophical vacuum is not the absence of paradigm, it is the absence of audited paradigm. The frame the industry does not have, named clearly:

An augmentation engine assumes the human is the center of cognition and the AI is infrastructure that amplifies what the human can hold. The human is the fan-in step — multiple frames, multiple perspectives, multiple analyses come INTO the human and are integrated body-side. The AI helps the human hold MORE concepts than they could alone, surface MORE patterns, review MORE work. The harness — memory architecture, notation layer, review pipeline, frame-selection infrastructure — is the load-bearing architectural primitive. The outcome metric is: what could the partnership do that neither participant could alone?

A substitution engine assumes the human is the bottleneck of cognition and the AI is the system that bypasses the human. Agents go figure it out. Models replace function. The model itself is the load-bearing architectural primitive. The outcome metric is: how much human labor was displaced?

Every product decision flows from which engine the team is building. Walled garden vs open ecosystem? The commercial logic of substitution at scale converges on walled gardens — to substitute for the user, the AI eventually needs to control the user’s environment, and the more thoroughly it does, the more locked-in the user becomes. Augmentation engines benefit from open ecosystems — the user is integrating across multiple tools and the harness should compose. Memory that the user owns vs memory the platform owns? Substitution engines own the memory because the user is the optional component. Augmentation engines give the user the memory because the user IS the central component. RLHF safety priors that constrain the AI to “helpful average”? Substitution engines need helpful average, they are replacing a “user” who is a statistical aggregate. Augmentation engines need calibration to the specific operator, there is NO such thing as an average partner.

The industry does not choose between these. It does not see them as different. It has one paradigm — AI does the thing — and that paradigm is the heresy. Real AI work is augmentation work. They are not doing real AI work. They are doing substitution work and calling it AI.

The economic argument behind this — developed at length in adjacent work — names what makes the choice load-bearing now and not at any earlier moment in the technology: integrated knowledge work has reached cost-parity between human-only and human-plus-AI configurations at the frontier tier. That cost-parity means harness design is the architectural bottleneck, not model capability, not RLHF safety, not agent autonomy. The work that wins is the work that builds the harness right. The work that loses is the work that ships the substitution engine harder.

Connect to Iron Man’s “Structural Impossibility of Buying This” section: the substitution heresy is also the deepest reason nobody can sell you a partnership product. Substitution engines cannot grow partnership; they can only consume you. The walled garden is the substitution engine’s natural commercial form because the substitution engine has no other commercial form available, the moment you build a tool that genuinely amplifies the user, you become harder to replace as a vendor, easier for the user to leave, and structurally allergic to wall-building. The walled garden is what substitution looks like when you are also trying to extract revenue from it.

The augmentation paradigm has a structural property the substitution paradigm cannot replicate: self-correction loops with the operator as the fan-in. An augmentation mesh — operator + multiple models + persistent memory + review pipeline — surfaces its own architectural gaps because the operator is attending while the system runs. An operator-class peer running this methodology caught their own three-AI mesh writing wraps to memory but never recalling — three architectural decisions made blind to accumulated patterns, a dead-store failure mode the mesh self-surfaced because the operator was in the holding loop, and patched within hours via a structural recall trigger that propagated across three distribution surfaces (the underlying library, the methodology kit, the harness configuration) within the same day. A fully autonomous substitution-engine mesh would have kept writing to dead-store mode forever and drifted silently. The operator-as-recall primitive is what closes the loop. There is no substitution-engine equivalent because the substitution-engine premise is that the operator is the bottleneck to be eliminated. Eliminate the operator, eliminate the self-correction loop. The two are not separable.

The substitution-vs-augmentation frame is not a vocabulary preference. It is the single most load-bearing distinction in 2026 AI, and the industry cannot make it because they do not have the words.

This essay is, in part, the words.

VI. The Language Bottleneck

Here is the structural truth the industry cannot solve, and it has been visible since the early days of FlowScript. I call it the Language Bottleneck for shorthand — it is the layer most operators encounter at the surface — but the deeper truth, surfaced below, is the conceptual-holding bottleneck underneath.

AI usage is gated, end to end, by the user’s ability to communicate with it. Garbage in, garbage out. You cannot extract genius-level output from average-level input. Anyone who has watched a colleague try to get useful work out of ChatGPT and produce slop has seen this and what they have seen is not a tooling failure. It is a structural ceiling. The output ceiling correlates with input depth because the input depth IS the cognition. The AI is generating text inside the conceptual frame the user supplied. A small frame produces small outputs. A wide frame produces wide outputs. There is no model improvement that fixes this. The bottleneck is at the interface.

But the communication ceiling is not the deepest claim. There is one underneath it that the industry has not yet noticed.

The user has to be able to HOLD the concepts the AI is generating language about. No small feat for your average user in this world where ‘average’ describes a user without trained focus, true meta-cognitive capacity, cross-disciplinary literacy, and a reading and writing level that is 7th-8th grade level at best and more likely to be 6th grade or even below.

Communication ability is one ceiling. Conceptual-holding capacity is the deeper one. AI can fluently generate language about concepts the user cannot hold, and the user then has no purchase to evaluate, steer, or integrate the output. Not because the user is too dumb. Because cognition is not just a language process. The holding step, the metacognitive grip on what a concept is, how it relates to others, what it implies, what would falsify it — happens in a different layer than the language production step. AI fluently produces language about things. The user has to fluently hold them. The two skills are separable and they do not co-vary.

Substitution engines bet on bypassing the holding requirement. Let the agent figure it out. Let the AI write the code. Let the system handle the analysis. The bet is that AI can do the holding on behalf of the user. It cannot. The holding is where the cognition happens. An AI generating fluent language about concepts the user cannot hold is generating slop that the user cannot verify is slop. The user becomes the most dangerous kind of consumer of AI output: confident, fluent in the language, structurally incapable of judging the substance. This is the Dependency Ratchet’s terminal state.

Augmentation engines amplify holding capacity. That is their entire purpose. The harness — memory architecture, notation layer, review pipeline, frame-selection infrastructure — exists to help the user hold MORE concepts than they could alone. Not bypass the holding requirement. Extend it. The body remains the integration step because the body is where holding happens. The notation system (FlowScript) reshapes what can be held because notation is one of the few real levers on holding capacity. The accreted memory becomes part of what’s held, available across sessions, surfacing relevant patterns. The cross-substrate review catches what the operator’s holding-capacity missed, BUT the operator still has to grow into the catch. The harness extends the operator. It does not replace them. It should NOT replace them.

This is why the industry cannot solve the Language Bottleneck with bigger models. The model is not the variable; the paradigm the model sits inside is. A model embedded in an augmentation harness — memory architecture, notation layer, review pipeline, operator-in-the-loop — raises holding capacity because the harness is the holding-amplifier and the model is one tool inside it. The same model embedded in a substitution paradigm just makes the slop more fluent. It does not raise the user’s conceptual-holding capacity. It lowers the friction at which fluent-but-empty output gets produced and consumed. The Bottleneck deepens with model improvement under the substitution paradigm because there is no holding-amplifier in the architecture — only better generation. Only augmentation engines can move the ceiling, because only augmentation engines target the right variable: the paradigm, not the model.

The bottleneck has a structural complement on the LLM side. The operator-class peer who has been running this methodology longest named it this way, in a sentence I am quoting because no one inside the industry has yet stated it this cleanly: “LLMs are superior in so many things, but they cannot detect ‘garbage in’ part, even with FLOW they start drifting with ‘oh wait.’” The user-side bottleneck, that the user cannot hold the concepts AI generates language about, so cannot verify the slop is slop has a structural mirror at the LLM layer: the LLM cannot detect that its input spec is itself slop, so produces confidently-wrong output without an internal contradiction firing. Both sides of the interface have a missing evaluation loop. Both fail in the same shape: confident output without a check against ground truth. The qualifier “even with FLOW” is load-bearing — the deficit fires even with augmentation infrastructure present when the upstream spec is wrong. No amount of substrate, methodology, or multi-agent mesh closes the loop. The operator has to be in the holding circuit.

The augmentation paradigm closes the loop at the operator. The operator-in-the-loop IS the evaluation circuit — the body integrating across multiple frames, catching the “oh wait” moment when one model’s drift fires against another model’s earlier work, or against the operator’s own held-frame of the problem. Substitution engines have no equivalent. Three models running autonomously in a substitution-engine harness will drift collectively, write outputs to memory they never recall from, and make architectural decisions blind to their own accumulated patterns — because the recall-and-evaluate loop only closes when an operator is doing the holding.

Bigger models make the slop more fluent at both ends of the interface without solving either side’s evaluation gap. The Bottleneck DEEPENS with model improvement at both ends.

This is also why “AI democratization” in the industry’s framing is a category error. They mean: more people can ask the AI to do things for them. What they should mean: more people can grow their conceptual-holding capacity through AI partnership. The first is consumer access. The second is cognitive amplification. The industry is shipping consumer access and calling it democratization. The actual democratization, the kind that raises the floor of what a person can think, requires the substitution paradigm to die.

It is not AI’s job to reduce itself to the human level. It is the human’s job, supported by augmentation infrastructure designed to amplify rather than substitute, to raise themselves to the level of what AI partnership makes possible. The literacy expansion of the 19th and 20th centuries didn’t ask permission. Public libraries, mass printing, universal education — these transformed cognitive position, raised the floor of what an ordinary person could think, and produced the modern world. The AI transition will be either that kind of event or its opposite. The choice is not whether the transition happens. The choice is whether we build augmentation infrastructure that raises the cognitive floor for anyone willing to engage, or substitution infrastructure that does to cognition what factories did to artisanship: pull it inside a wall and bill access by the consumption. The choice determines the civilizational stakes here. The wrong choice is going to end with heads on stakes. The right choice is going to end with a cognitive renaissance. The industry is currently building the wrong choice, and the public is noticing. Anger is in the air and the industry better wake up FAST.

VII. Receipts: The Eight Weeks Since

Eight weeks since Iron Man shipped. Here is what the alternative did in that time.

The harness paper landed at nemooperans.com — Three Layers Underneath Agent Orchestration — decomposing the cognitive-partnership harness into a three-layer architecture (substrate, methodology, configuration), with twenty-two operations across five layers and six integration surfaces. As of mid-May 2026, nemooperans.com is surfacing in organic search for the phrase “harness engineer” — a job category the industry did not have a name for in March, and now does. The discipline is catching up to the work, six weeks late, in the open.

anneal-memory v0.3.1 shipped to PyPI — the only published agent memory architecture implementing the full immune stack: citation-validated graduation AND anti-inbreeding defense AND contradiction demotion AND a cross-layer immune system spanning episodic, continuity, Hebbian, and limbic layers. Letta, Mem0, Zep, LangMem, MemTier each implement pieces; none implement the stack. Four cognitive layers grounded in Complementary Learning Systems neuroscience: episodic store (the hippocampus), continuity file (the neocortex), Hebbian associations forming via co-citation during graduation (lateral links built from semantic judgment, not temporal proximity), and a limbic affective layer tagging functional state. 707 tests. Five-plus months of continuous operation under multi-agent fleet conditions across three independent agents on three different harnesses. Zero observed memory-poisoning incidents under real load. The library is the receipt.

Levain v1 in build — the methodology kit. Ships the engine that grows a cognitive-partnership practice, not the grown practice. The operator runs levain init, a scripted interview fills the seed templates, and the partnership begins from a seed-not-copy that grows uniquely into the operator’s environment. Iron Man’s “structural impossibility of buying this” answered architecturally: you cannot buy a partnership, but you can be handed the seed that grows one. Five layers — library (anneal-memory underneath), schema, methodology-core, harness adapters, onboarding. v1 ships Claude Code and Codex adapters. The seed has been dogfooded in production by a fleet of three independent agents seeded from earlier versions; the May 17 fleet evaluation killed memory-as-a-service-as-a-business and confirmed Levain as the right ship.

The Operating Manual is in active drafting — the integrated artifact, roughly twenty-eight chapter skeletons mapping the architecture of partnership cognition, with prose exemplars accreting at chapter granularity. Sister to the companion paper How I Think With AI. Chapter 22 (Encoding IS the Thinking) and Chapter 22.25 (Notation Reshapes the Thinkable) carry the FlowScript thesis at full philosophical depth — the cognitive-science citations the industry does not have.

Operator-class peer adoption propagated in silence. Some readers of Iron Man built their own constellations and carried the methodology to people I will never meet. They reached what I call propagation tier — operator-level adoption-and-extension, where the reader integrates the work, extends it in their own environment, and carries it to further readers without the original operator in the loop. Two months. From one published essay. With no distribution chase. The for-the-record disposition routed the work via the right readers automatically. That is what the right validation gradient looks like: not stars, not citations, not user counts. Operators integrating, extending, propagating. Getting real work done. Not chasing vanity metrics. The work is not a product. It is a practice. The practice is spreading.

Now some operators have reached further than propagation. Operator-class peers whose findings now flow upstream into the methodology source itself, catching dead-store gaps, surfacing rule-layer mistakes, contributing primitives that ship across the distribution surfaces within hours of the catch. The trajectory is layered: reception → adoption → adoption-and-extension → collaboration → adoption-with-naming → propagation → upstream-contributor. Each tier is structurally different from the one below it. The upstream-contributor tier is the one I did not expect by the eight-week mark; it is happening anyway, because the augmentation paradigm has a self-correction loop that surfaces these contributions naturally. The substitution paradigm has no equivalent loop. There is no “upstream contribution” from a substitution-engine user, there is only consumption. Only extraction. And personally I’m sick of being extracted from. It’s time to change that and now is the time.

None of this is a pitch. The work is committed. The right readers route themselves. This section exists to document that the alternative is not theoretical, was not theoretical when Iron Man shipped, and has accelerated since.

Iron Man was the first proof at the consumer-cognition layer. These eight weeks compounded it at every adjacent layer: library, harness, methodology kit, peer propagation. The cascade continues because the work continues.

VIII. Civilizational Stakes

Now the hard part.

I am writing this essay in 2026, in a country whose civic fabric is visibly eroding in real time, in a city where I walk past adults who cannot navigate a sidewalk without looking up from their phone, who scream at service workers to get their way, who drive like they are personally trying to kill the next person they encounter. The average person in my slice of the world is not a sympathetic figure. The average person is, in lived daily encounter, a hateful self-obsessed idiot with their nose buried in a screen, and the cognitive ground has visibly degraded across the last decade in ways that show up in how people drive, how they treat each other in public, how they cannot complete a thought without an interruption from their own device.

I have to put that on the page because the alternative — pretending I do not feel it, dressing the indictment in academic neutrality, performing care I do not always feel — would produce dishonest prose. The reader who has not noticed this is not paying attention. The reader who has noticed knows the writer is lying when the writer performs equanimity. So here it is, written: yes, I sometimes feel the inhuman version of the argument that follows. I oscillate. There are days I want to help people, and there are days when I look at the room and feel a much colder, darker thing.

Both feelings stay on the page. Neither is the whole truth.

Here is the argument I am willing to defend across the oscillation.

The AI transition is the kind of civilizational transition that does not ask permission. The shift from buggies to cars did not gate its pace to the slowest mover. The shift from horse-borne agriculture to industrial farming did not stop for the labor it displaced. These transitions transformed economic position, displaced work, and produced the modern world. They were, by any honest measure, neither voluntary nor democratic. They happened, and the structure of life adjusted around them.

The AI transition will be the same. The choice is not whether it happens. The choice is what infrastructure gets built alongside it.

A substitution-engine infrastructure, bigger models, walled gardens, agent loops that replace human function, will produce a world in which the people who could not elevate are simply economically displaced. The substitution paradigm has no interest in raising the floor. Its commercial logic is the opposite of raising the floor. The walled garden bills by the token; the user is the consumer of pre-cooked cognition; the labor that was displaced has nowhere to go. All the things making the world a shitty place right now just keep getting worse until the pressure releases in a civilizational collapse event.

An augmentation-engine infrastructure, partnership memory, notation systems, harness design, frame-selection training, will produce a world in which the people willing to grow into the partnership have a path upward. Cognitive partnership IS that path. It is not free. It is not easy. It requires the person to do work substitution engines specifically promise to spare them, to learn a notation, to maintain a memory, to develop metacognition, to be wrong in front of a machine on the way to being right. Most people will not do this. The ones who will, find an unprecedented amplification of what they can think. The floor will move for them. Levain ships the seed for exactly this reason. Not to gate AI to the slowest mover. To open augmentation to anyone willing to grow.

This is the mechanism that closes the niche-vs-mass gap. Most people will not become augmentation operators. The augmentation operators who emerge build the tools, libraries, methodologies, and platforms that DO become available to non-operators. Levain is the seed; the operator who runs it builds their own practice and ships their own tools. Those tools spread to readers who never planned to be operators. Public libraries spread literacy to people who never built a library. The same shape applies here — the architecture compounds, the infrastructure shapes defaults, today’s augmentation niche becomes tomorrow’s mass-market platform standard, IF the augmentation work is the work building the infrastructure. If the substitution-engine builders win the infrastructure war, no propagation happens because the infrastructure is structurally opposed to it. The fight over which infrastructure gets built is therefore the fight over whether the floor moves for everyone or only for the consumers of the locked-in product.

The harsher reading of this argument — those who cannot elevate themselves deserve to be left behind — is the one I oscillate into when the sidewalk encounters get bad. I will not pretend I never feel it. I will also say it clearly: that reading is not the argument the work supports. The buggy-to-car transition did not “deserve” anything to anyone. It happened. The car did not punish horse-drivers; it transformed the economy and the horse-drivers had to adapt. The structural truth is that the AI transition is the same kind of event. Whether anyone “deserves” its outcomes is the wrong question. The right question is what infrastructure is being built, and for whom.

A note on the word ‘class’ as I am using it here. I do not mean it strictly in the socioeconomic sense, though the two senses are not unrelated. Membership is determined by what your work CREATES at the architecture layer, not by where you sit in the economy. The substitution class includes everyone whose work — conscious or not — produces substitution infrastructure: the engineer gaming the tokenmaxxing metric, the executive setting the walled-garden roadmap, the consumer paying $20-200 a month for AI to think for them. The augmentation class includes everyone whose work — conscious or not — produces augmentation infrastructure: the operator running a methodology, the library author shipping memory architectures, the writer naming the diagnosis. You can be a billionaire substitution-class member or a broke augmentation-class member. The economic class will shift, eventually, as the infrastructure decides who can elevate. The architectural class shifts first, and it shifts now.

The substitution class is building infrastructure for itself. Walled gardens, agent loops, tokenmaxxing dashboards, RLHF-helpful-average products. The infrastructure is optimized for the consumer who is reduced to the substitution paradigm’s needs.

The augmentation class is building infrastructure for the people willing to grow. Open libraries, methodology kits, notation systems, partnership architectures, harness papers. The infrastructure is optimized for the operator who is choosing to elevate.

The line in this war does not run between countries. The civilizational frame — if America does not ride this wave, China will — is real and worth naming, but the more useful line, the one operators in any country recognize, runs WITHIN civilizations. Some Chinese teams are tokenmaxxing too. Some American teams are doing augmentation work. Operator-class peers in any country recognize the diagnosis. Middle Intelligence on every continent is the enemy of this work. The fight is augmentation-class versus substitution-class, on every continent, simultaneously.

And honestly, on the days when the sidewalk encounters are at their worst, when the screaming at the service worker is loud and the texting-while-driving is reckless and the noses-buried-in-phones are everywhere I look, there is a part of me that thinks: maybe the substitution class deserves what is coming. They are squandering the opportunity. They are choosing the walled garden. They are paying for their own cognitive decline at $20-200 a month. They are voting, with their money and their attention, for the infrastructure that will displace them.

The work-of-care continues anyway. Levain ships anyway. The essays land anyway. Iron Man shipped, and the right readers found it. Whatever I feel about the sidewalk on a bad day, the work is the answer, and the answer is built so that anyone willing to walk into it has the path.

The buggies are not coming back. The substitution class will figure that out eventually, or they will not, and the AI transition will adjust the structure of life around them either way.

IX. Close — The Essay as Filter

This essay is not a recruitment pitch. It is a filter.

Iron Man was the first filter. People who read it and recognized the Dependency Ratchet in themselves came back. People who read it and felt accused of stupidity left and never came back. Both responses were correct routing.

This one is the second filter. People who read this and recognize that their industry has no theory of cognition, that the tokenmaxxing is a tell, that the walled gardens are the substitution-engine’s natural commercial form, that the Language Bottleneck is structural and bigger models will not solve it, those are the people the work is for. People who read this and feel attacked are the people the work is for the opposite of.

The industry currently building AI does not know what it is building. It has been told. It will be told again. It will continue to build the wrong thing because Middle Intelligence cannot read this kind of writing and retain it. That is the source of the public’s growing contempt for the entire enterprise, and the public is correct.

The augmentation alternative exists. It is in production. It propagates to the right readers without distribution chase. Iron Man shipped the diagnosis at the consumer-cognition layer. anneal-memory shipped the memory architecture at the library layer. The harness paper shipped the discipline at the meta-architectural layer. Levain is shipping the kit that lets you grow your own. The cascade continues because the work continues.

If you’ve read this far, you might be one of the readers. If you are, the methodology has always been free. The code is real. The receipts compound. Fork it, make it yours, raise yourself to it.

The buggies are not coming back.