Claude’s 1M context window is now GA, integrated into the standard API at no extra cost
On March 13, 2026, Anthropic made the 1M‑token context window for Claude Opus 4.6 and Sonnet 4.6 generally available (GA).
“1M tokens” is roughly 750,000 words—about 15 Japanese paperbacks. Until now it was treated as beta, which meant you needed a special header to use it from the API and you paid a surcharge for long context. GA removes those constraints.
Because the last month has been a blur of updates around the Claude API and Claude Code, this post covers not only the 1M GA itself but the related feature additions as well.
What Changed With 1M GA
API changes
| Item | During beta | After GA |
|---|---|---|
| beta header | Required above 200K: anthropic-beta: long-context-2025-01-01 | Not required (existing code keeps working) |
| Rate limits | Separate quota for long context | Unified into normal account limits |
| Image/PDF limit | 100 per request | Increased to 600 (6×) |
| Surcharge | Sonnet 4.5/4 had a long‑context surcharge | Opus 4.6/Sonnet 4.6 have no surcharge |
| Platforms | Claude API | Claude API + Azure Foundry + Vertex AI |
Claude Code changes
Claude Code now supports the 1M context as well. It’s available to Max, Team, and Enterprise plan users and is enabled automatically when using Opus 4.6.
In practice, the most noticeable effect is fewer compaction events (automatic context summarization). One user reported a 15% reduction. Compaction summarizes and compresses context as conversations grow; it’s useful, but information loss is inevitable. If compaction rarely kicks in, agents are less likely to forget early instructions during long‑running work.
I run this blog with Claude Code and have used techniques like “taming an overgrown CLAUDE.md” and “externalizing conversation state,” as described in my Token Management Guide. With 1M GA some of those tricks may no longer be necessary. That said, fully packing 1M tokens every time will spike cost, so operational tuning like Session Management and Permission Settings remains important.
Beta constraints remain for older models
For models other than Opus 4.6 and Sonnet 4.6, 1M context remains in beta.
| Model | 1M context status | beta header | Surcharge |
|---|---|---|---|
| Opus 4.6 | GA | Not required | None |
| Sonnet 4.6 | GA | Not required | None |
| Sonnet 4.5 | Beta | Required | Yes |
| Sonnet 4 | Beta | Required | Yes |
Opus 4.5 and earlier Opus models only support up to 200K.
Pricing
There’s no additional fee for using long context. A 900K‑token request is billed at the same per‑token rate as a 9K‑token request.
| Model | Input (per M tokens) | Output (per M tokens) |
|---|---|---|
| Opus 4.6 | $5 | $25 |
| Sonnet 4.6 | $3 | $15 |
Filling the window with Opus 4.6 costs 3. Not cheap, but while other providers add surcharges for long context, Anthropic chose a flat rate.
For reference, as noted in the Sonnet 4.6 release post, Sonnet 4.6 delivers equal or better coding performance than Opus 4.5 at one‑fifth the price. With long context also on a flat rate, its cost‑effectiveness for agents stands out even more.
MRCR v2 Benchmark
Opus 4.6 achieves 78.3% on MRCR v2 (Multi‑Hop Reading Comprehension and Retrieval), the top score among frontier models when evaluated at maximum context length.
MRCR v2 is a set of multi‑hop retrieval tasks. Think questions like “How does X stated in A relate to Y stated in B?” that you can answer only by combining information across multiple places. Using a full 1M tokens while maintaining accuracy is far harder than a simple needle‑in‑a‑haystack search.
Needle‑in‑a‑haystack measures “can you find specific information in a large text,” whereas MRCR v2 measures “can you relate the information you found and reason over it.” The latter is what most real‑world work requires.
Timeline of 1M Context
A quick look back at how 1M context rolled out.
graph TD
A["2025年8月<br/>Sonnet 4 ベータ開始"] --> B["2025年8月下旬<br/>Vertex AI対応"]
B --> C["2025年11月<br/>Opus 4.5 リリース<br/>200Kのまま"]
C --> D["2026年2月5日<br/>Opus 4.6 リリース<br/>1Mベータ対応"]
D --> E["2026年2月17日<br/>Sonnet 4.6 リリース<br/>1Mベータ対応"]
E --> F["2026年3月13日<br/>Opus 4.6 / Sonnet 4.6<br/>正式GA"]
F --> G["betaヘッダー不要<br/>追加料金なし<br/>メディア上限600"]
GA came after roughly seven months of beta. Opus 4.5 (Nov 2025) was still capped at 200K, and Opus 4.6 jumped straight to 1M.
Where 1M Context Shines
Even 200K could handle fairly long inputs, but 1M qualitatively changes the use cases.
| Use case | Details |
|---|---|
| Feed entire codebases | Pass the entire project into the prompt and request refactors or reviews |
| Bulk analysis of large documents | Feed contracts, papers, and specs wholesale and ask cross‑cutting questions |
| Long‑running agents | Continue while retaining the full trace of tool calls, observations, and reasoning |
| High‑volume image/PDF processing | With the cap raised to 600, process hundreds of pages in one shot |
Features GA’d in Feb–Mar
Beyond 1M GA, the Claude API saw a wave of features reach GA across February and March 2026. Some went largely unnoticed, but they pair nicely with 1M context.
Opus 4.6 additions (Feb 5)
The Opus 4.6 release itself included many changes.
- Adaptive Thinking:
thinking: {type: "adaptive"}lets the model auto‑tune depth of thought. Manually specifyingbudget_tokensis now deprecated. - Fast mode: A mode that makes output token generation up to 2.5× faster. It’s in research preview and carries premium pricing.
- Data residency control: Specify the inference region with the
inference_geoparameter. US‑only runs cost 1.1×. - No prefill support: The “prefill” technique (seeding the start of the assistant message to steer output) is no longer supported.
effort parameter (GA on Feb 5)
The successor to budget_tokens. It lets you control depth of reasoning with simple values like low / medium / high. Together with Opus 4.6’s adaptive thinking, it removes the need for manual token‑budget tuning.
Compaction API (beta on Feb 5)
A server‑side context‑summarization feature. As the context window nears its limit, it automatically summarizes and compresses older conversation. Available with Opus 4.6. Useful for conversations so long that even 1M context isn’t enough (e.g., agents that run for hours).
Structured output (GA on Jan 29)
Guarantees responses conform to a JSON Schema via output_config.format. This reached GA on the Claude API in January, eliminating the need for a beta header.
GA with the Sonnet 4.6 release (Feb 17)
The following shipped to GA alongside Sonnet 4.6.
| Feature | Summary |
|---|---|
| Web search tool | Built‑in web search for the Claude API. With dynamic filtering, you can run code to filter search results before adding them to the context window, improving accuracy while reducing token use |
| Web fetch tool | Fetches the content of a given URL for processing inside Claude |
| Code execution tool | Executes code in a sandbox. Free when used together with Web Search / Web Fetch |
| Programmatic tool calling | Instead of Claude calling tools step‑by‑step, you write Python that executes multiple tools in one shot. Benchmarks show a 37% reduction in token usage |
| Tool discovery tool | Rather than stuffing 50+ tool definitions into the prompt, load only what you need on demand. Cuts tokens by 85% (72K → 8.7K). Especially helpful when connecting lots of tools via MCP |
| Fine‑grained tool streaming | Streams tool‑use parameters without buffering |
These rolled out as betas across late 2025. Many users either didn’t notice they were still beta or weren’t aware of the beta header. With GA, the header is no longer required and these are part of the standard API.
Automatic caching (Feb 19)
One cache_control field selects optimal cache points automatically. Previously you had to manage cache breakpoints by hand; as a conversation grows, cache points move forward automatically. Caching has a big impact for long‑context requests, so it pairs well with 1M GA.
Sunset Schedule for Old Models
The 1M GA comes alongside an ongoing phase‑out of older models.
| Model | Status |
|---|---|
| Sonnet 3.7 | Discontinued on Feb 19, 2026 |
| Haiku 3.5 | Discontinued on Feb 19, 2026 |
| Opus 3 | Discontinued on Jan 5, 2026 |
| Haiku 3 | Scheduled for discontinuation on Apr 19, 2026 |
All models at or before Sonnet 3.5 have already been sunset. Researchers can continue accessing them through the external researcher access program.
1M context is now generally available for Opus 4.6 and Sonnet 4.6 | Claude Claude Developer Platform - Release Notes