What Claude Code's multi-agent review says about subagents versus orchestration

Anthropic has released a research preview of multi-agent code review for Claude Code. When a PR opens, multiple agents run in parallel, find bugs, rank them by importance, and post both summary comments and inline comments to GitHub.

Why call it “multi-agent”?

Traditional automated code review tools depend on static rules or a single model reading the diff once. Claude Code takes a different approach.

When a PR arrives, multiple agents run in parallel and search for bugs independently. The results are cross-checked to remove false positives, then ranked by severity before being posted as both summary comments and inline comments.

The design clearly prefers depth over speed. Anthropic’s existing open-source GitHub Action is faster, but this review flow takes about 20 minutes on average.

What Anthropic sees in production

Anthropic also published internal usage numbers.

PR size	Coverage	Average findings
1000+ lines	84%	7.5 findings
under 50 lines	31%	0.5 findings

Accuracy is reported as “over 99% correct.” The reason false positives stay low is the validation phase between agents.

Review coverage also changed dramatically. The share of PRs that received review went from 16% before the rollout to 54% after it. That is a major shift for smaller teams and open-source projects with high PR volume.

Examples of bugs it found

Anthropic shared two examples.

The first was a one-line change that looked minor but broke authentication. Human reviewers often skip changes that small, so the bug could easily have slipped through.

The second came from TrueNAS. The agent found a type mismatch in adjacent code touched by the change, not in the changed line itself. It was a bug in the encryption-key cache being wiped on every request, the kind of issue you miss if you only read the diff.

Cost and management

Depending on PR size and complexity, each review costs about $15 to$ 25. Pricing is token-based, so larger diffs cost more.

Anthropic provides monthly spend caps, repo-level enablement, and an analytics dashboard for cost and adoption tracking.

Setup is straightforward: an admin installs the GitHub App from Claude Code settings and chooses the target repositories. Developers do not need any extra configuration; the review runs automatically when they open a PR.

Availability

It is currently available as a research preview for Team and Enterprise plans only. Personal plans are not included.

The feature is still an early-stage preview, so accuracy and capabilities will keep improving. If cost is a concern, the practical way to start is to enable it only for a small repository or a specific team.

Subagents vs orchestration

The term “multi-agent” gets used everywhere in AI coding tools now, but the underlying architecture differs a lot. In practice there are two major families: subagent-style systems and orchestration-style systems.

Subagent-style

A parent agent spawns child agents, and each child works independently within its own context window. The parent only receives the final output. Claude Code is built this way: the main agent defines specialized subagents such as Explore and Plan in Markdown files and spawns them when needed.

The advantage is simplicity. The interface between parent and child is just “task in, result out,” so you do not need to design a communication protocol between agents.

graph TD
    A[Main agent] -->|task| B[Subagent A<br/>Explore: read-only]
    A -->|task| C[Subagent B<br/>Plan: design-focused]
    A -->|task| D[Subagent C<br/>General: can edit]
    B -->|result only| A
    C -->|result only| A
    D -->|result only| A

Anthropic says a configuration using Claude Opus 4 as the lead agent and Sonnet 4 as subagents improved performance by 90.2% compared with Opus 4 alone.

Orchestration-style

An orchestrator manages the whole workflow as a graph or state machine and controls state sharing and message passing between agents. LangGraph and CrewAI are examples.

The biggest difference from subagents is that orchestrated agents can share state. LangGraph uses a shared state object; CrewAI can share short-term, long-term, and entity memory. The downside is that the developer has to design the workflow structure up front.

View	Subagent style	Orchestration style
State sharing	none; children only see the parent’s final output	yes, through shared state or memory
Workflow definition	not required; the parent spawns children dynamically	required; the graph must be designed in advance
Agent-to-agent communication	one-way from parent to child	two-way with handoffs and messaging
Best for	parallel exploration, isolated specialist tasks	workflows with complex inter-step dependencies
Context management	isolated per agent	shared state can also cause contamination

Claude Code’s review flow is a good fit for the subagent model. Code review naturally splits into “find”, “verify”, and “report” phases, and those phases do not need much shared state. It is enough to collect all findings at the end.

Where the major frameworks stand

OpenAI Agents SDK

The experimental 2024 Swarm framework was replaced by the production-oriented Agents SDK in March 2025. At DevDay 2025, OpenAI also announced AgentKit, which adds visual tooling and enterprise features.

The main primitives are Agent, Handoff, and Guardrails. “Handoff” is OpenAI’s term for explicitly passing control from one agent to another, which is different from the subagent pattern where the child simply returns a result.

Microsoft AutoGen -> Microsoft Agent Framework

AutoGen was rewritten in January 2025 as a fully asynchronous, event-driven v0.4. Microsoft has since announced Microsoft Agent Framework, which unifies AutoGen and Semantic Kernel. AutoGen itself will only receive critical bug fixes.

The community fork AG2, split from AutoGen in November 2024, continues independently. Conversation-based agent design is flexible, but it also introduces the most latency because of the negotiation overhead.

CrewAI

An independent Python framework that does not depend on LangChain. It has more than 20,000 GitHub stars and over 100,000 certified developers.

CrewAI uses a two-layer design: Crews for autonomous teams and Flows for event-driven orchestration. It is said to run comparable workflows 2-3x faster than LangGraph.

LangGraph

LangGraph models workflows as directed graphs of nodes and edges. It has more than 47 million PyPI downloads, making it one of the largest ecosystems.

Its key differentiator is checkpointing: workflows can be paused and resumed across environments. That makes it strong for long-running jobs and human-in-the-loop systems, but the graph design cost is high for simple tasks.

Google ADK + A2A

Google’s Agent Development Kit is code-first, Gemini-optimized, and model-agnostic. The important part is the A2A (Agent-to-Agent) protocol, an open standard released in April 2025. It is meant to let agents built on different frameworks talk to each other.

Pairing Codex CLI with this world

OpenAI’s Codex CLI can be exposed as an MCP server, so its codex() and codex-reply() tools can be called from outside. That means it can be used as one node in an Agents SDK orchestration, or Claude Code can call Codex through MCP.

I previously tried a tmux-based Claude Code + Codex setup. That was a crude way to coordinate the two sessions, but MCP keeps each agent’s context isolated while still letting them cooperate.

Claude Code can find bugs in a PR and hand the fix generation to Codex, or do the reverse. In practice, though, model quality differences and context handoff loss still make a single model family with subagents more stable in many cases.

Claude Code’s subagent implementation details

Claude Code subagents are defined in Markdown files with YAML frontmatter. Each one has:

an isolated context window
a custom system prompt
tool permissions, either read-only or full access

Since v2.1.19, Claude Code also has TeammateTool / Agent Teams. The older subagent model was one-way parent-to-child communication, but Agent Teams lets agents send direct messages through JSON inboxes on disk, and even compete for tasks from a shared list.

graph TD
    subgraph Subagent style (classic)
        P1[Parent agent] -->|instruction| C1[Child A]
        P1 -->|instruction| C2[Child B]
        C1 -->|result| P1
        C2 -->|result| P1
    end

    subgraph Agent Teams (v2.1.19+)
        L[Leader] -->|task| T1[Teammate A]
        L -->|task| T2[Teammate B]
        T1 <-->|JSON inbox| T2
        T1 -->|result| L
        T2 -->|result| L
    end

It is a rough but practical design. Because each agent is isolated at the OS process level, the system is resilient to crashes. It also works with tmux-spawned sessions, so you can actually watch multiple Claude Code instances cooperate in the terminal.

Orchestration frameworks tend to use in-memory message passing or event buses, while Claude Code leans on the most primitive IPC of all: the filesystem.