What Claude Code's multi-agent review says about subagents versus orchestration
Contents
Anthropic has released a research preview of multi-agent code review for Claude Code. When a PR opens, multiple agents run in parallel, find bugs, rank them by importance, and post both summary comments and inline comments to GitHub.
Why call it “multi-agent”?
Traditional automated code review tools depend on static rules or a single model reading the diff once. Claude Code takes a different approach.
When a PR arrives, multiple agents run in parallel and search for bugs independently. The results are cross-checked to remove false positives, then ranked by severity before being posted as both summary comments and inline comments.
The design clearly prefers depth over speed. Anthropic’s existing open-source GitHub Action is faster, but this review flow takes about 20 minutes on average.
What Anthropic sees in production
Anthropic also published internal usage numbers.
| PR size | Coverage | Average findings |
|---|---|---|
| 1000+ lines | 84% | 7.5 findings |
| under 50 lines | 31% | 0.5 findings |
Accuracy is reported as “over 99% correct.” The reason false positives stay low is the validation phase between agents.
Review coverage also changed dramatically. The share of PRs that received review went from 16% before the rollout to 54% after it. That is a major shift for smaller teams and open-source projects with high PR volume.
Examples of bugs it found
Anthropic shared two examples.
The first was a one-line change that looked minor but broke authentication. Human reviewers often skip changes that small, so the bug could easily have slipped through.
The second came from TrueNAS. The agent found a type mismatch in adjacent code touched by the change, not in the changed line itself. It was a bug in the encryption-key cache being wiped on every request, the kind of issue you miss if you only read the diff.
Cost and management
Depending on PR size and complexity, each review costs about 25. Pricing is token-based, so larger diffs cost more.
Anthropic provides monthly spend caps, repo-level enablement, and an analytics dashboard for cost and adoption tracking.
Setup is straightforward: an admin installs the GitHub App from Claude Code settings and chooses the target repositories. Developers do not need any extra configuration; the review runs automatically when they open a PR.
Availability
It is currently available as a research preview for Team and Enterprise plans only. Personal plans are not included.
The feature is still an early-stage preview, so accuracy and capabilities will keep improving. If cost is a concern, the practical way to start is to enable it only for a small repository or a specific team.
Subagents vs orchestration
The term “multi-agent” gets used everywhere in AI coding tools now, but the underlying architecture differs a lot. In practice there are two major families: subagent-style systems and orchestration-style systems.
Subagent-style
A parent agent spawns child agents, and each child works independently within its own context window. The parent only receives the final output. Claude Code is built this way: the main agent defines specialized subagents such as Explore and Plan in Markdown files and spawns them when needed.
The advantage is simplicity. The interface between parent and child is just “task in, result out,” so you do not need to design a communication protocol between agents.
graph TD
A[Main agent] -->|task| B[Subagent A<br/>Explore: read-only]
A -->|task| C[Subagent B<br/>Plan: design-focused]
A -->|task| D[Subagent C<br/>General: can edit]
B -->|result only| A
C -->|result only| A
D -->|result only| A
Anthropic says a configuration using Claude Opus 4 as the lead agent and Sonnet 4 as subagents improved performance by 90.2% compared with Opus 4 alone.
Orchestration-style
An orchestrator manages the whole workflow as a graph or state machine and controls state sharing and message passing between agents. LangGraph and CrewAI are examples.
The biggest difference from subagents is that orchestrated agents can share state. LangGraph uses a shared state object; CrewAI can share short-term, long-term, and entity memory. The downside is that the developer has to design the workflow structure up front.
| View | Subagent style | Orchestration style |
|---|---|---|
| State sharing | none; children only see the parent’s final output | yes, through shared state or memory |
| Workflow definition | not required; the parent spawns children dynamically | required; the graph must be designed in advance |
| Agent-to-agent communication | one-way from parent to child | two-way with handoffs and messaging |
| Best for | parallel exploration, isolated specialist tasks | workflows with complex inter-step dependencies |
| Context management | isolated per agent | shared state can also cause contamination |
Claude Code’s review flow is a good fit for the subagent model. Code review naturally splits into “find”, “verify”, and “report” phases, and those phases do not need much shared state. It is enough to collect all findings at the end.
Where the major frameworks stand
OpenAI Agents SDK
The experimental 2024 Swarm framework was replaced by the production-oriented Agents SDK in March 2025. At DevDay 2025, OpenAI also announced AgentKit, which adds visual tooling and enterprise features.
The main primitives are Agent, Handoff, and Guardrails. “Handoff” is OpenAI’s term for explicitly passing control from one agent to another, which is different from the subagent pattern where the child simply returns a result.
Microsoft AutoGen -> Microsoft Agent Framework
AutoGen was rewritten in January 2025 as a fully asynchronous, event-driven v0.4. Microsoft has since announced Microsoft Agent Framework, which unifies AutoGen and Semantic Kernel. AutoGen itself will only receive critical bug fixes.
The community fork AG2, split from AutoGen in November 2024, continues independently. Conversation-based agent design is flexible, but it also introduces the most latency because of the negotiation overhead.
CrewAI
An independent Python framework that does not depend on LangChain. It has more than 20,000 GitHub stars and over 100,000 certified developers.
CrewAI uses a two-layer design: Crews for autonomous teams and Flows for event-driven orchestration. It is said to run comparable workflows 2-3x faster than LangGraph.
LangGraph
LangGraph models workflows as directed graphs of nodes and edges. It has more than 47 million PyPI downloads, making it one of the largest ecosystems.
Its key differentiator is checkpointing: workflows can be paused and resumed across environments. That makes it strong for long-running jobs and human-in-the-loop systems, but the graph design cost is high for simple tasks.
Google ADK + A2A
Google’s Agent Development Kit is code-first, Gemini-optimized, and model-agnostic. The important part is the A2A (Agent-to-Agent) protocol, an open standard released in April 2025. It is meant to let agents built on different frameworks talk to each other.
Pairing Codex CLI with this world
OpenAI’s Codex CLI can be exposed as an MCP server, so its codex() and codex-reply() tools can be called from outside. That means it can be used as one node in an Agents SDK orchestration, or Claude Code can call Codex through MCP.
I previously tried a tmux-based Claude Code + Codex setup. That was a crude way to coordinate the two sessions, but MCP keeps each agent’s context isolated while still letting them cooperate.
Claude Code can find bugs in a PR and hand the fix generation to Codex, or do the reverse. In practice, though, model quality differences and context handoff loss still make a single model family with subagents more stable in many cases.
Claude Code’s subagent implementation details
Claude Code subagents are defined in Markdown files with YAML frontmatter. Each one has:
- an isolated context window
- a custom system prompt
- tool permissions, either read-only or full access
Since v2.1.19, Claude Code also has TeammateTool / Agent Teams. The older subagent model was one-way parent-to-child communication, but Agent Teams lets agents send direct messages through JSON inboxes on disk, and even compete for tasks from a shared list.
graph TD
subgraph Subagent style (classic)
P1[Parent agent] -->|instruction| C1[Child A]
P1 -->|instruction| C2[Child B]
C1 -->|result| P1
C2 -->|result| P1
end
subgraph Agent Teams (v2.1.19+)
L[Leader] -->|task| T1[Teammate A]
L -->|task| T2[Teammate B]
T1 <-->|JSON inbox| T2
T1 -->|result| L
T2 -->|result| L
end
It is a rough but practical design. Because each agent is isolated at the OS process level, the system is resilient to crashes. It also works with tmux-spawned sessions, so you can actually watch multiple Claude Code instances cooperate in the terminal.
Orchestration frameworks tend to use in-memory message passing or event buses, while Claude Code leans on the most primitive IPC of all: the filesystem.