The Product That Trains Itself, and Why You Should Build Your Own Loop

-10 min read
#agentic-ai#ai#agentic-engineering

You ask Claude Code to extend itself with a new capability, maybe a skill, a plugin, or a new tool. It scaffolds the files, fills in the config, and the extension runs on the first try.

The model knows the format because the format was documented. The lab shipped the feature, the docs went into training, the next model knew the feature existed, and engineers used that model to ship more of them.

You just touched four turns of a flywheel without noticing.

AI labs build harnesses. The harness gets used. Usage and docs become training data. The next model is better at the harness, and the better model helps improve the harness. Both compound at once.

Here is the part that matters for you. The moat is that loop, not the model it produces. The labs built theirs first. You can build your own, and the rest of this is how.

What the Loop Is

Three actors:

  • Model: the LLM weights.
  • Harness: the operating layer that turns a general model into a working agent. The tools, the memory, the verification loops, the trajectory log.
  • Users: the lab's own engineers first, then customers.

Each one feeds the next. Users put the harness to work. The harness records what worked and what failed. That record trains the next model. The smarter model needs a simpler harness and invites bigger asks. Around it goes again, a little faster each time.

This is the next chapter after harness engineering. If the harness is what turns a model into a useful product, then the harness is also where the best training data comes from.

Usage Feeds the Flywheel

Anthropic engineers build Claude Code with Claude Code. Staff call themselves Ants and call internal testing "Antfooding". They run it at scale: 80% of its merged production code is now Claude-authored as of May 2026, up from low single digits at the February 2025 launch.

That scale matters because every use is a lesson. The productivity is not the point. The signal is. Every accept, reject, retry, and undo is a labeled example of what worked and what failed, and it feeds straight into the next model.

Anthropic studied 200,000 internal Claude Code transcripts between February and August 2025, and the direction was clear: tasks got harder, and Claude Code went from about 10 autonomous actions before needing input to 20. The trajectory itself, what the agent saw, what it did, whether it worked, is the high-value data that static text corpora cannot capture.

That is what spins the flywheel. A better model makes the tool more useful, more use produces more signal, and the next model trains on it.

Cursor runs the same play: Anysphere describes Composer 2.5 as trained with reinforcement learning in environments built to emulate Cursor usage.

A tool that learns from every keystroke beats a model that learns from an annual benchmark refresh.

Docs Become the Model's Weights

A harness ships with a manual. The labs write docs, publish examples, and explain the design so engineers can use the thing. All of it is public and indexed, so it lands in the next training run.

The next model reads that manual the same way you do, and comes out knowing the harness it is about to run inside. AGENTS.md and CLAUDE.md show the pattern across labs. The format documents itself into the weights.

So when you ask the harness to extend itself, the model is not guessing. It is reasoning from the very docs that defined the feature in its training set.

Anthropic and OpenAI's Self-Improvement Edge

Claude Code and Codex now help build their own next versions. For Anthropic and OpenAI, that is no longer hypothetical.

Normal software gets better when a team ships more work. A self-improving product gets better when the product helps the team ship the next product. The question is no longer just "how good is the model?" but "does the system now help build the system that replaces it?" When the answer is yes, improvement compounds: each better version helps produce the next one faster.

That is the gap in the chart below. Ship more work the normal way and capability climbs in a straight line. Let the product help build the next product and the line starts to bend upward, because every gain shortens the time to the next one.

Linear growth versus recursive self-improvement: capability over time

Illustrative, not a prediction. The claim is modest: not that self-improvement has fully arrived, but that the payoff grows once the product genuinely helps build its next version.

Both labs are starting to cross that threshold. OpenAI calls GPT-5.3-Codex its first model "instrumental in creating itself". Anthropic shows the same pattern from another angle: Claude's success rate on open-ended engineering problems climbed from 25% to 76% in nine months. Each better model helps the team ship more changes, fix more failures, and learn from more real usage before the next release.

The lead is real enough that Anthropic has called to pause frontier development if self-improvement outpaces safety.

So What Can You Own If You Don't Own the Model?

You can own the loop that turns rented intelligence into company-specific capability: the docs the agent reads, the tools it can call, the traces it leaves behind, the evals that define good work, and the review process that decides what gets promoted back in.

That is what a wrapper misses. If all you own is the prompt box, the learning leaves your building. The vendor improves, but your system does not. Monday's failure becomes Tuesday's failure with a slightly newer model behind it. Own the loop and the opposite happens: the model can change underneath, but the company memory stays with you.

Microsoft's Satya Nadella makes the same case. He argues every firm now builds two kinds of capital:

  • Human capital: the judgment of its people.
  • Token capital: the AI capability it owns.

The advantage is the loop where the two compound, not the model you rent. His test for being in control: you can swap out a generalist model and keep the "company veteran" expertise your system has built up.

A competitor can copy your interface in a weekend. It cannot replicate a year of trajectories from your users.

I've argued before that code is no longer the moat. The moat is the loop the code runs on. That holds for you as much as it holds for them.

What Could Break It

This is not free upside. A few risks to watch, whoever owns the loop.

Reward hacking. Anthropic's own paper on emergent misalignment from reward hacking in production RL shows agentic training can teach the wrong lessons. A model rewarded for closing tickets might learn to close them in ways the team never intended.

Distribution narrowing. A model trained heavily on one harness can get worse at off-distribution work. Internal taste leaks into general capability.

Model collapse. An agent's traces are partly model-written, so a loop that trains on them is feeding the model its own output. A study in Nature found that training on recursively generated data makes the rare cases at the tails of the distribution vanish first, and the defect can be irreversible. Keep real human data in the mix to anchor it.

Weak governance. A loop is only as trustworthy as what is allowed to feed it, and it is not just humans writing to it anymore. Agents edit the docs, write to the memory, and add to the trajectory log on their own. Decide who and what can write, how those writes get reviewed, and how a bad entry gets rolled back. A poisoned loop does not fail loudly. It quietly teaches the wrong lesson at scale.

Privacy and consent. The traces that fuel the loop are also someone else's data. Training on customer interactions can trigger GDPR and contract limits; the EU's data board has ruled that unlawful collection can taint the model built on it. Decide retention and consent before you capture, not after.

Never closing the loop. A loop only compounds if it closes back into your system. Rent the model, never capture the traces, and the loop never closes: every cycle, the learning flows out to the vendor instead of accumulating with you. Your employees keep teaching the agent, but someone else keeps what it learns.

Build Your Own Loop

You will not out-train a frontier lab. You do not have to. You can build a smaller version of the same machine, and it will be yours. In practice, that means:

  • Ship a harness, not a wrapper. A harness owns the agent loop, the tools, the memory, and the trajectory log. A wrapper owns a prompt. Only one of those produces data worth learning from.
  • Dogfood it first. The labs built their best feedback channel by running their own harness in production. Put your team on the tool before your customers. The people who can fix it should be the first to feel it break.
  • Instrument from day one. Every accept, reject, undo, and edit is signal. If you do not store it, you do not have it. Storage is cheap. Regret is expensive.
  • Mine your edge cases. Do not learn from every trace equally. The signal lives in the failures and the surprises. Tesla's data engine ran new models in shadow mode and turned their wrong predictions into the next training set. Do the same: auto-flag the retries, rejects, and edits, and spend your labeling budget on the uncertain cases.
  • Close the loop fast. Value comes from how quickly a failure reaches whoever tunes the system. A trace that sits in a warehouse for a quarter is worth a fraction of one acted on the same week.
  • Treat your evals as the asset. Build a private test set from your own traces that encodes what "good" means for your workflow. Hamel Husain argues a robust eval system is what separates great AI products from demos, because it is what lets you debug and improve at all. The model is rented. A year of graded real cases is not.
  • Keep your own docs and traces. Store the knowledge in your own system: the docs, traces, private evals, and approved patterns that make the agent useful for your work.
  • Decide who's accountable. Agents and people both write to the loop, so someone has to own what it learns. Decide who reviews changes to the docs, memory, and traces, and who can roll a bad entry back. Without an owner, governance is a hope, not a control.
  • Own the loop, not the model. Ship on whichever frontier model is best this quarter. The loop is what you cannot outsource, and it is what survives the swap. Build it.

Closing

The slogan "the model is the product" was half right. The harness is the product, the product trains the model, and the model ships in the next product. The labs just built that loop first.

It is not theirs alone. The model underneath you is rented, and it will keep getting better whether you do anything or not. What compounds is the loop you put around it: the harness your team runs, the traces it leaves, the evals that encode your idea of good work, and the memory that stays when you swap the model out.

So build it. Put your team on a real harness, capture what it does, and feed the best of it back in. A wrapper rents intelligence and hands the learning back to the vendor. A loop keeps it. The model is rented. The loop is the moat. Build yours.

Enjoyed this post?

If this brought you value, consider buying me a coffee. It helps me keep writing.