Tech 7 min read

Amazon Bedrock Mantle's OpenAI-compatible API is now generally available

IkesanContents

On March 3, 2026, AWS announced general availability for OpenAI API compatibility in Amazon Bedrock’s distributed inference engine, Mantle. That means existing OpenAI SDK code, such as the Python openai package and the @openai/openai npm package, can run on Amazon Bedrock with only minimal changes.

What changes

Previously, when using Bedrock’s native API, the InvokeModel request format differed depending on the model provider, and it was necessary to use different SDKs and request formats for Anthropic models and Mistral models. With this OpenAI API compatibility, that fragmentation will be largely eliminated.

The only changes that need to be made are switching the base URL (endpoint) and setting AWS authentication information. Prompt logic, message structure, and parameter names can remain OpenAI API compliant.

Compatible models and API

The Mantle engine’s OpenAI-compatible API supports open weight models from multiple providers.

  • Google (Gemma series, etc.)
  • DeepSeek (DeepSeek-R1 etc.)
  • Mistral (Mistral Large, Mistral 7B, etc.)
  • Moonshot AI
  • MiniMax
  • Model provided by Nvidia
  • Openweight version of OpenAI

There are currently two supported APIs: Chat Completions API and Responses API. You can run Mistral Large via Bedrock by simply replacing the code that specified gpt-4o in the OpenAI SDK with the Bedrock model identifier (e.g. mistral.mistral-large-2411-v1:0).

Projects API and IAM integration

This announcement also includes the addition of the Projects API. When using Bedrock across multiple applications, environments, and teams, you can manage resources separately on a project-by-project basis.

  • IAM-based access control IAM roles and policies can be set for each project
  • Cost Visualization You can track costs on a project-by-project basis using tags.
  • No additional charges Projects API itself is free, only model inference costs are charged

In enterprise environments, different models are often used for development, staging, and production. Separating projects makes it easier to differentiate model usage for each environment.

What is Mantle engine?

Mantle is a distributed inference engine built by AWS that specializes in large-scale model serving. It is responsible for tensor parallel inference across multiple GPU instances and serves as an inference backend for many models provided on Bedrock.

Bedrock also provides models with their own APIs, such as Claude (Anthropic), but the scope of Mantle’s processing is mainly open-weight model inference. You can use the optimized serving layer on AWS infrastructure without worrying about it.

Context with AWS/OpenAI partnership

In February 2026, AWS and OpenAI announced a strategic partnership. The two companies plan to jointly develop a stateful runtime environment and provide it via Amazon Bedrock, with the main purpose being to support enterprise AI services that require large-scale and complex context processing.

This OpenAI API compatible feature is placed in the context of that partnership. AWS will enhance OpenAI compatibility on the Bedrock platform, making it easier for developers familiar with OpenAI’s SDK ecosystem to migrate to Bedrock.

Responses API vs Chat Completions API

The two APIs provided by Mantle have very different architectures and uses.

ItemResponses APIChat Completions API
State managementServer side (stateful)Client side (stateless)
Conversation chainConnect with previous_response_idSend entire history every time
Background processingSupportedNot supported
Data retentionApproximately 30 daysZDR (Zero Data Retention) compliant
Compatible modelsOpenAI GPT OSS series onlyAll models on Mantle
Tool usageNative supportSupport

The biggest feature of the Responses API is that it maintains the conversation state on the server side. The previous conversation context is automatically restored by simply passing the previous_response_id, so there is no need for the client to manage and send the entire history. Longer conversations save more bandwidth.

On the other hand, the Responses API retains data for approximately 30 days. In environments where ZDR is required due to regulatory requirements, Chat Completions API is the only choice. Additionally, the models supported by the Responses API are currently limited to OpenAI GPT OSS 20B and 120B, so if you want to use DeepSeek or Mistral, you will have to choose the Chat Completions API.

flowchart TD
    A[APIを選択] --> B{ZDR要件あり?}
    B -->|はい| C[Chat Completions API]
    B -->|いいえ| D{GPT OSS以外の<br/>モデルが必要?}
    D -->|はい| C
    D -->|いいえ| E{長時間推論や<br/>バックグラウンド処理?}
    E -->|はい| F[Responses API]
    E -->|いいえ| G{サーバー側で<br/>会話管理したい?}
    G -->|はい| F
    G -->|いいえ| C

Background processing

The Responses API has a background mode. Requests are submitted to a queue and processed asynchronously, and the client waits for completion by polling.

sequenceDiagram
    participant Client
    participant Mantle
    Client->>Mantle: POST /v1/responses<br/>(background: true)
    Mantle-->>Client: 202 Accepted<br/>(response_id)
    loop ポーリング
        Client->>Mantle: GET /v1/responses/{id}
        Mantle-->>Client: status: in_progress
    end
    Client->>Mantle: GET /v1/responses/{id}
    Mantle-->>Client: status: completed<br/>(結果)

Avoids the issue of HTTP connections timing out during long-running inference tasks. Useful for agent-like workloads with complex inference and tool call chains.

Supported regions and endpoints

The endpoint format is unified as bedrock-mantle.{region}.api.aws. Also compatible with Tokyo region.

RegionCodeEndpoint
US East (Virginia)us-east-1bedrock-mantle.us-east-1.api.aws
US East (Ohio)us-east-2bedrock-mantle.us-east-2.api.aws
US West (Oregon)us-west-2bedrock-mantle.us-west-2.api.aws
Asia Pacific (Tokyo)ap-northeast-1bedrock-mantle.ap-northeast-1.api.aws
Asia Pacific (Mumbai)ap-south-1bedrock-mantle.ap-south-1.api.aws
Asia Pacific (Jakarta)ap-southeast-3bedrock-mantle.ap-southeast-3.api.aws
Europe (Frankfurt)eu-central-1bedrock-mantle.eu-central-1.api.aws
Europe (Ireland)eu-west-1bedrock-mantle.eu-west-1.api.aws
Europe (London)eu-west-2bedrock-mantle.eu-west-2.api.aws
Europe (Milan)eu-south-1bedrock-mantle.eu-south-1.api.aws
Europe (Stockholm)eu-north-1bedrock-mantle.eu-north-1.api.aws
South America (Sao Paulo)sa-east-1bedrock-mantle.sa-east-1.api.aws

In February 2026, support for PrivateLink was also expanded to allow access to Mantle’s OpenAI compatible APIs via VPC endpoints. Bedrock inference can be used without going through the public internet, making it easy to deploy even in environments with strict security requirements.

Migration code example

Migrating from using OpenAI directly to Bedrock Mantle only requires replacing two environment variables.

# OpenAI直接利用
export OPENAI_API_KEY=sk-xxxxx
export OPENAI_BASE_URL=https://api.openai.com/v1

# Bedrock Mantleへ切り替え
export OPENAI_API_KEY=<Amazon Bedrock APIキー>
export OPENAI_BASE_URL=https://bedrock-mantle.ap-northeast-1.api.aws/v1

The only change in the Python code is the model ID.

from openai import OpenAI

client = OpenAI()  # 環境変数から自動読み込み

# OpenAI直接: model="gpt-4o"
# Bedrock Mantle: モデルIDを差し替える
completion = client.chat.completions.create(
    model="mistral.mistral-large-2411-v1:0",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

This is what happens when you use the Responses API to have a stateful conversation.

# 1ターン目
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[{"role": "user", "content": "Pythonの非同期処理について教えて"}]
)

# 2ターン目(サーバー側でコンテキスト自動復元)
response2 = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[{"role": "user", "content": "asyncioとの違いは?"}],
    previous_response_id=response.id
)

Implementation notes

Even though it is compatible with OpenAI API, it does not mean that all functions are fully compatible.

  • Support for stream=True (streaming) is model dependent
  • Advanced parameters such as n > 1 (multiple candidate generation) and logprobs may not be supported.
  • Compatibility with Embeddings API is currently unconfirmed.

Assuming that the basic Chat Completions API functions, testing to confirm operation is required before going live. It would be more accurate to understand that the core functionality is compatible, rather than just throwing in the existing OpenAI code and changing the model ID and everything will work.

Known compatibility issues

An issue reported in AWS re:Post is incompatibility with the official OpenAI .NET SDK.

  • created_at field is returned in scientific notation (float) and .NET SDKs expecting an integer fail to parse.
  • If you send content in array format, it may be rejected.

While there are currently no major issues reported with the Python SDK, additional testing is required when using SDKs for other languages ​​such as .NET or Go. Mantle’s compatibility layer seems to be primarily targeting the OpenAI Python SDK, and edge cases with other language SDKs may be addressed later.