Tech 7 min read

Claude’s 1M context window is now GA, integrated into the standard API at no extra cost

On March 13, 2026, Anthropic made the 1M‑token context window for Claude Opus 4.6 and Sonnet 4.6 generally available (GA).

“1M tokens” is roughly 750,000 words—about 15 Japanese paperbacks. Until now it was treated as beta, which meant you needed a special header to use it from the API and you paid a surcharge for long context. GA removes those constraints.

Because the last month has been a blur of updates around the Claude API and Claude Code, this post covers not only the 1M GA itself but the related feature additions as well.

What Changed With 1M GA

API changes

ItemDuring betaAfter GA
beta headerRequired above 200K: anthropic-beta: long-context-2025-01-01Not required (existing code keeps working)
Rate limitsSeparate quota for long contextUnified into normal account limits
Image/PDF limit100 per requestIncreased to 600 (6×)
SurchargeSonnet 4.5/4 had a long‑context surchargeOpus 4.6/Sonnet 4.6 have no surcharge
PlatformsClaude APIClaude API + Azure Foundry + Vertex AI

Claude Code changes

Claude Code now supports the 1M context as well. It’s available to Max, Team, and Enterprise plan users and is enabled automatically when using Opus 4.6.

In practice, the most noticeable effect is fewer compaction events (automatic context summarization). One user reported a 15% reduction. Compaction summarizes and compresses context as conversations grow; it’s useful, but information loss is inevitable. If compaction rarely kicks in, agents are less likely to forget early instructions during long‑running work.

I run this blog with Claude Code and have used techniques like “taming an overgrown CLAUDE.md” and “externalizing conversation state,” as described in my Token Management Guide. With 1M GA some of those tricks may no longer be necessary. That said, fully packing 1M tokens every time will spike cost, so operational tuning like Session Management and Permission Settings remains important.

Beta constraints remain for older models

For models other than Opus 4.6 and Sonnet 4.6, 1M context remains in beta.

Model1M context statusbeta headerSurcharge
Opus 4.6GANot requiredNone
Sonnet 4.6GANot requiredNone
Sonnet 4.5BetaRequiredYes
Sonnet 4BetaRequiredYes

Opus 4.5 and earlier Opus models only support up to 200K.

Pricing

There’s no additional fee for using long context. A 900K‑token request is billed at the same per‑token rate as a 9K‑token request.

ModelInput (per M tokens)Output (per M tokens)
Opus 4.6$5$25
Sonnet 4.6$3$15

Filling the window with Opus 4.6 costs 5justforinput;withSonnet4.6its5 just for input; with Sonnet 4.6 it’s 3. Not cheap, but while other providers add surcharges for long context, Anthropic chose a flat rate.

For reference, as noted in the Sonnet 4.6 release post, Sonnet 4.6 delivers equal or better coding performance than Opus 4.5 at one‑fifth the price. With long context also on a flat rate, its cost‑effectiveness for agents stands out even more.

MRCR v2 Benchmark

Opus 4.6 achieves 78.3% on MRCR v2 (Multi‑Hop Reading Comprehension and Retrieval), the top score among frontier models when evaluated at maximum context length.

MRCR v2 is a set of multi‑hop retrieval tasks. Think questions like “How does X stated in A relate to Y stated in B?” that you can answer only by combining information across multiple places. Using a full 1M tokens while maintaining accuracy is far harder than a simple needle‑in‑a‑haystack search.

Needle‑in‑a‑haystack measures “can you find specific information in a large text,” whereas MRCR v2 measures “can you relate the information you found and reason over it.” The latter is what most real‑world work requires.

Timeline of 1M Context

A quick look back at how 1M context rolled out.

graph TD
    A["2025年8月<br/>Sonnet 4 ベータ開始"] --> B["2025年8月下旬<br/>Vertex AI対応"]
    B --> C["2025年11月<br/>Opus 4.5 リリース<br/>200Kのまま"]
    C --> D["2026年2月5日<br/>Opus 4.6 リリース<br/>1Mベータ対応"]
    D --> E["2026年2月17日<br/>Sonnet 4.6 リリース<br/>1Mベータ対応"]
    E --> F["2026年3月13日<br/>Opus 4.6 / Sonnet 4.6<br/>正式GA"]
    F --> G["betaヘッダー不要<br/>追加料金なし<br/>メディア上限600"]

GA came after roughly seven months of beta. Opus 4.5 (Nov 2025) was still capped at 200K, and Opus 4.6 jumped straight to 1M.

Where 1M Context Shines

Even 200K could handle fairly long inputs, but 1M qualitatively changes the use cases.

Use caseDetails
Feed entire codebasesPass the entire project into the prompt and request refactors or reviews
Bulk analysis of large documentsFeed contracts, papers, and specs wholesale and ask cross‑cutting questions
Long‑running agentsContinue while retaining the full trace of tool calls, observations, and reasoning
High‑volume image/PDF processingWith the cap raised to 600, process hundreds of pages in one shot

Features GA’d in Feb–Mar

Beyond 1M GA, the Claude API saw a wave of features reach GA across February and March 2026. Some went largely unnoticed, but they pair nicely with 1M context.

Opus 4.6 additions (Feb 5)

The Opus 4.6 release itself included many changes.

  • Adaptive Thinking: thinking: {type: "adaptive"} lets the model auto‑tune depth of thought. Manually specifying budget_tokens is now deprecated.
  • Fast mode: A mode that makes output token generation up to 2.5× faster. It’s in research preview and carries premium pricing.
  • Data residency control: Specify the inference region with the inference_geo parameter. US‑only runs cost 1.1×.
  • No prefill support: The “prefill” technique (seeding the start of the assistant message to steer output) is no longer supported.

effort parameter (GA on Feb 5)

The successor to budget_tokens. It lets you control depth of reasoning with simple values like low / medium / high. Together with Opus 4.6’s adaptive thinking, it removes the need for manual token‑budget tuning.

Compaction API (beta on Feb 5)

A server‑side context‑summarization feature. As the context window nears its limit, it automatically summarizes and compresses older conversation. Available with Opus 4.6. Useful for conversations so long that even 1M context isn’t enough (e.g., agents that run for hours).

Structured output (GA on Jan 29)

Guarantees responses conform to a JSON Schema via output_config.format. This reached GA on the Claude API in January, eliminating the need for a beta header.

GA with the Sonnet 4.6 release (Feb 17)

The following shipped to GA alongside Sonnet 4.6.

FeatureSummary
Web search toolBuilt‑in web search for the Claude API. With dynamic filtering, you can run code to filter search results before adding them to the context window, improving accuracy while reducing token use
Web fetch toolFetches the content of a given URL for processing inside Claude
Code execution toolExecutes code in a sandbox. Free when used together with Web Search / Web Fetch
Programmatic tool callingInstead of Claude calling tools step‑by‑step, you write Python that executes multiple tools in one shot. Benchmarks show a 37% reduction in token usage
Tool discovery toolRather than stuffing 50+ tool definitions into the prompt, load only what you need on demand. Cuts tokens by 85% (72K → 8.7K). Especially helpful when connecting lots of tools via MCP
Fine‑grained tool streamingStreams tool‑use parameters without buffering

These rolled out as betas across late 2025. Many users either didn’t notice they were still beta or weren’t aware of the beta header. With GA, the header is no longer required and these are part of the standard API.

Automatic caching (Feb 19)

One cache_control field selects optimal cache points automatically. Previously you had to manage cache breakpoints by hand; as a conversation grows, cache points move forward automatically. Caching has a big impact for long‑context requests, so it pairs well with 1M GA.

Sunset Schedule for Old Models

The 1M GA comes alongside an ongoing phase‑out of older models.

ModelStatus
Sonnet 3.7Discontinued on Feb 19, 2026
Haiku 3.5Discontinued on Feb 19, 2026
Opus 3Discontinued on Jan 5, 2026
Haiku 3Scheduled for discontinuation on Apr 19, 2026

All models at or before Sonnet 3.5 have already been sunset. Researchers can continue accessing them through the external researcher access program.

1M context is now generally available for Opus 4.6 and Sonnet 4.6 | Claude Claude Developer Platform - Release Notes