Tech 5 min read

OpenAI ships GPT-5.5 and GPT-5.5 Pro on the API

IkesanContents

OpenAI’s API Changelog lists GPT-5.5 and GPT-5.5 Pro as of April 24, 2026.
It is also being discussed on Hacker News, but the place to start is the official Changelog, the GPT-5.5 model page, and the GPT-5.5 Pro model page.

This is not just another model drop. The default for reasoning effort, image input resolution, the caching mechanism, the long-context pricing tier, and the API surface for the Pro version have all changed.
Swap GPT-5.4-class models for these as a drop-in replacement and you may hit unexpected latency or billing behavior before you even get to quality differences.

Designed around the Responses API

The Changelog publication date is April 24, 2026.
gpt-5.5 is available via the Chat Completions API, the Responses API, and the Batch API, while gpt-5.5-pro is restricted to the Responses API (and Batch).

ModelPrimary useMain API surfaceInputOutputContext
gpt-5.5Complex tasks, coding, tool useChat / Responses / BatchText, imageText1,050,000
gpt-5.5-proHeavier reasoning, hard problems with thinking timeResponses / BatchText, imageText1,050,000

Maximum output is 128,000 tokens for both, and the knowledge cutoff is December 1, 2025.
1M-class context is becoming common — Anthropic GA’d the Claude 1M context window and Xiaomi made MiMo-V2.5 with 1M context available early — but with OpenAI the big deal is that the 1M context can be used together with the Responses API’s built-in tool stack.

According to the Changelog, GPT-5.5 supports Structured Outputs, Function Calling, and Prompt Caching, plus Tool Search, Built-in Computer Use, and Hosted Shell.
The official Using GPT-5.5 guide also recommends the Responses API for reasoning and tool calling use cases.
OpenAI-compatible endpoints like the NVIDIA NIM free inference API have spread on top of Chat Completions, but to get the most out of GPT-5.5 you end up using Responses API features.

The Pro version is a specialized async model

GPT-5.5 Pro is not just a higher-quality version of the regular model — its API usage pattern is different.
It spends more compute on producing precise responses, so a request can take several minutes; OpenAI recommends using Background mode to avoid timeouts.

ItemGPT-5.5GPT-5.5 Pro
Input price$5.00 / 1M tokens$30.00 / 1M tokens
Cached input$0.50 / 1M tokensNo discount
Output price$30.00 / 1M tokens$180.00 / 1M tokens
StreamingYesNo
Apply PatchYesNo
SkillsYesNo
Computer UseYesNo
Tool SearchYesNo

Streaming and several tool features are unsupported, so for an agent that uses a wide range of tools, plain GPT-5.5 is often the easier fit.
GPT-5.5 Pro also costs 6x the regular model on both input and output tokens, with no cached-input discount, so workloads that reuse a long shared prompt will see a sharp cost gap.
Like when Codex shifted from per-message to per-token billing, the choice has to factor in not just the unit model price but also cache efficiency and waiting time.

reasoning.effort now defaults to medium

For GPT-5.5, the default reasoning.effort is medium.
For latency-sensitive workloads or light information extraction, leaving the default in place may burn extra reasoning time.
The official guide suggests that even when latency matters, if there is still tool use or multi-step decision-making involved, you should try low first instead of jumping straight to none.

For complex tasks, on the other hand, high or xhigh become candidates — but with vague stopping conditions, excessive exploration can backfire.
Rather than treating GPT-5.5 as a drop-in replacement for the GPT-5.4 line, it is worth re-measuring low / medium / high on representative tasks.

Change in default image input behavior

When image_detail is unspecified or set to auto, GPT-5.5 now behaves like original, retaining more visual information.
Images are passed without resizing up to 10,240,000 pixels, or up to a 6,000-pixel dimension cap.
(With high explicitly set, the cap drops to 2,500,000 pixels or a 2,048-pixel dimension cap.)

This helps screenshot analysis with Computer Use and similar workloads, but workloads that send a lot of images will see input token counts rise.
If you want predictable cost and latency, set image_detail explicitly.

Caching and long-context constraints

According to the Changelog, GPT-5.5 caching only works with extended prompt caching; in-memory prompt caching is not supported.
The regular model’s cached input is $0.50 per 1M tokens, one-tenth of regular input, so an agent design that reuses the same prefix and tool definitions will see effective unit cost change significantly.

That said, for prompts above 272K input tokens, GPT-5.5’s standard, Batch, and Flex sessions are billed at 2x input and 1.5x output.
Just because the context limit is 1M does not mean stuffing everything in is cheap, so a retrieval layer that narrows down information up front becomes necessary.