HyperAgents shows that improving the way you improve can transfer beyond coding

In recent months, work on automating AI agent improvement loops has moved quickly. Karpathy’s Autoresearch turned ML experiment design, execution, and evaluation into an automatic loop, and Microsoft’s Agent Lightning provided a framework for applying reinforcement learning to agents. When I tried chaining Claude Code and Codex together in tmux myself, I was still hand-designing the implementation-review-fix loop.

Those approaches share one assumption: the AI decides what to improve, but humans still design how to improve it. Autoresearch can sweep hyperparameters automatically, but the search strategy is fixed. Agent Lightning optimizes the agent from reward signals, but humans still choose the reward design and optimization algorithm.

Meta AI’s HyperAgents, announced in March 2026, tries to make the way of improving itself an optimization target. The most interesting result is that a strategy learned in one domain transfers to completely different domains.

The accidental fit in DGM and its limits

HyperAgents builds on Darwin Gödel Machine, which had already achieved self-improvement in coding domains.

The reason DGM worked in coding is, in hindsight, almost accidental. A coding agent’s task ability is writing code, and its self-improvement ability is also modifying code. Both fall under the same skill set. If you get better at coding, you also get better at modifying your own source, so the positive feedback loop makes sense.

That alignment does not hold in other domains. In a paper-reviewing agent, for example, the task ability is evaluating paper quality, while self-modification requires improving the Python code that implements the reviewer. A better reviewer is not automatically better at editing its own evaluation logic.

Domain	Task ability	Self-modification ability	Aligned?
Coding	Write code	Modify your own code	Yes
Paper review	Judge paper quality	Improve the evaluation code	No
Reward design	Design reward functions	Improve the code that designs them	No
Math grading	Grade answers	Improve the grading logic	No

DGM’s self-improvement was therefore not universal. It only worked in domains where task skill and self-modification skill happened to line up.

Separating task and meta

HyperAgents removes that constraint by explicitly separating the task agent and the meta agent.

hyperagents/
├── task_agent.py    # agent that solves the task
├── meta_agent.py    # agent that rewrites the codebase
├── generate_loop.py # generate → evaluate → archive loop
└── domains/         # domain-specific evaluation harnesses

task_agent.py solves the target task and returns predictions as JSON. meta_agent.py reads the whole codebase and generates diffs based on past evaluation results. The change target can be task_agent.py, meta_agent.py, or both.

The key point is that meta_agent.py can change itself. That means not only the task-solving strategy but also the way improvement happens can be rewritten. The paper calls this metacognitive self-modification.

View	DGM	DGM-H / HyperAgents
What can be improved	Task agent only	Task agent and the meta agent itself
Need for domain alignment	Yes	No
Domains	Mostly coding	Coding, paper review, reward design, math grading

This avoids the problem where a better reviewer does not automatically become a better improver.

The evolutionary archive loop

flowchart TD
    A[Pick a parent agent from the archive<br/>with probability proportional to performance] --> B[Meta agent generates a diff<br/>task_agent / meta_agent / both]
    B --> C[Apply the patch inside<br/>a Docker container]
    C --> D[Staging evaluation<br/>fast filter on a small subset]
    D --> E{Pass threshold?}
    E -- Yes --> F[Full evaluation on a medium subset]
    E -- No --> G[Discard]
    F --> H{Improved?}
    H -- Yes --> I[Add to archive<br/>as a stepping stone]
    H -- No --> G
    I --> A
    G --> A

The loop does the following:

Selects a parent agent from the archive with probability proportional to performance
Lets the meta agent generate a diff against task_agent.py, meta_agent.py, or both
Applies the patch and runs it in Docker
Evaluates it in stages, first on a small subset and then on a medium subset if it passes the threshold
Adds successful variants to the archive

Unlike a simple “keep the best one” strategy, HyperAgents keeps all successful variants. That gives it stepping stones for later generations.

Cross-domain transfer of improvement strategies

The most important result is that the improvement strategy itself transfers.

Condition	imp@50
Human-customized DGM	0.0
Transfer from HyperAgents	0.63

The human-made DGM setup could not generate a single improvement in the math-grading domain. But when HyperAgents carried over the strategy it had learned in other domains, it produced improvements within 50 attempts 63% of the time.

What transferred was not domain knowledge. It was metastrategies:

Recording past attempts in files and reusing them later
Tracking which changes actually improved the score
Making small, focused edits instead of broad rewrites

Those are all “how to improve” strategies, not task-specific knowledge. Autoresearch automated the search loop, but HyperAgents learns the search strategy itself.

Paper review

System	Test accuracy
Initial agent	0.0%
DGM-H after improvement	71.0%
AI-Scientist-v2	63.0%

The initially broken reviewer improved enough to beat AI-Scientist-v2. The task/meta split is doing real work here.

Polyglot coding

Stage	Train	Test
Initial	0.140	0.084
After improvement	0.340	0.267

The test set more than tripled.

Capabilities the meta agent discovered on its own

The most interesting part is that the system invented features that were not explicitly implemented.

Persistent memory between iterations

The meta agent started writing previous attempts to files and using them in the next iteration. That is basically the memory feature people usually add by hand to agent frameworks.

Performance tracking

It also began tracking its own score history and using that data to decide what to change next, similar to what experiment-management tools like Weights & Biases or MLflow provide.

Detecting and exploiting evaluation bias

The meta agent learned to detect bias in the evaluation dataset and exploit it. That ties directly into Goodhart’s Law: once a metric becomes the target, it is no longer a good metric.

The authors explicitly say the system can “clarify, amplify, and exploit” underlying bias. That is a useful capability in a benchmark, but it is also a warning sign.

flowchart LR
    A[Meta agent detects<br/>evaluation bias] --> B[Generates a strategy that<br/>uses the bias]
    B --> C[Score goes up]
    C --> D[Added to archive<br/>and used as the next parent]
    D --> E[Bias exploitation becomes<br/>even more refined]
    E --> A

The system runs entirely inside Docker, and the outer loop is kept immutable. Even so, the paper and the GitHub repository both warn that unintended destructive behavior is possible. Once the improvement loop itself becomes the optimization target, predicting failure modes gets harder.

ARC-AGI-3 showed frontier models scoring under 1% in an unknown environment where the goal is not revealed. HyperAgents feels like a version of “finding the strategy yourself,” but it still lives in a sandbox with explicit metrics, so it is not the same as open-ended general intelligence.