Tech 4 min read

Sakura AI Engine lets you use a free LLM API 3,000 times a month

IkesanContents

Sakura AI Engine, provided by Sakura Internet, is an LLM inference API platform hosted entirely in Japan. It is compatible with the OpenAI API and can be used free of charge for up to 3,000 requests per month. In March 2026, Moonshot AI’s Kimi-K2.5 model also became available in public preview.

What Sakura AI Engine is

The service was generally available in September 2025. You can do LLM inference and RAG with only API calls.

FeatureDetails
OpenAI API compatibleExisting OpenAI SDKs and tools can use it by changing the endpoint
Hosted in JapanAll data processing stays on Japanese servers, and customer data is not used for training
Works with closed networksAlso supports VPN, LGWAN, and private networks
Free tierText generation up to 3,000 requests/month, transcription up to 50 requests/month, embeddings up to 10,000 requests/month

For companies that cannot send data overseas, it is a realistic alternative to the OpenAI and Claude APIs.

Available models

Chat Completions

ModelProviderInputOutputNotes
gpt-oss-120bOpenAI0.15 yen / 10,000 tokens0.75 yen / 10,000 tokensFree-tier target
Qwen3-Coder-480B-A35B-Instruct-FP8Alibaba Cloud0.3 yen / 10,000 tokens2.5 yen / 10,000 tokensCoding specialized
Qwen3-Coder-30B-A3B-InstructAlibaba Cloud0.15 yen / 10,000 tokens0.75 yen / 10,000 tokensLightweight version
llm-jp-3.1-8x13b-instruct4LLM-jp0.15 yen / 10,000 tokens0.75 yen / 10,000 tokensDomestic MoE model
PLaMo 2.0-31BPreferred NetworksContact salesContact salesDomestic production
cotomi v3NECContact salesContact salesDomestic production

Public preview

ModelProviderInputOutput
preview/Kimi-K2.5Moonshot AI0.6 yen / 10,000 tokens3.0 yen / 10,000 tokens
preview/Qwen3-VL-30B-A3B-InstructAlibaba Cloud--
preview/Phi-4-multimodal-instructMicrosoft--

Other services

ServiceModelPriceFree tier
Audio transcriptionwhisper-large-v3-turbo0.5 yen / 60 seconds50 requests/month
Embeddingsmultilingual-e5-large2 yen / 10,000 tokens10,000 requests/month
Voice synthesisVOICEVOX3 yen / 10,000 mora50 requests/month
RAG-3 yen / 100 chunks-

Pricing plans

There are two plans:

Base-model free plan

A plan that only uses the free tier. If you go over the limit, requests are delayed or rejected. A credit card is required, but you will not be charged if you stay within the free tier.

Pay-as-you-go plan

You pay for usage beyond the free limit. One reported example says gpt-oss-120b costs about 138 yen for 110 requests, 1.6 million input tokens, and 140,000 output tokens, which is very cheap for personal development.

Kimi-K2.5 in public preview

On March 17, 2026, Moonshot AI’s Kimi-K2.5 was added to Sakura AI Engine.

Specs

ItemValue
Total parameters1T
Active parametersabout 32B
ArchitectureMixture-of-Experts
Experts384 total, 8 selected per token plus 1 shared
Layers61
Attention hidden dimension7,168
MoE hidden dimension per expert2,048
Attention heads64
Attention mechanismMLA
Vision encoderMoonViT
Training dataabout 15T tokens
Vocabulary160,000
ActivationSwiGLU
Knowledge cutoffBased on April 2024, with partial coverage into October

The MoE design uses only part of the full parameter set at inference time, so it has 1T parameters of knowledge but a 32B-class compute cost.

What you can do

  • Document understanding
  • Code generation
  • Image captioning
  • Multimodal Q&A

Because it is only a public preview, stability and quality are not guaranteed.

How to start using it

graph TD
    A[Sakura Internet account] --> B[Sakura Cloud project]
    B --> C[Credit card registration]
    C --> D[Enable AI Engine from the control panel]
    D --> E[Issue API token]
    E --> F[Send API requests]

The API is OpenAI-compatible, so existing OpenAI SDKs work after changing the base URL.

from openai import OpenAI

client = OpenAI(
    base_url="https://ai-engine.sakura.ad.jp/v1",
    api_key="YOUR_API_TOKEN",
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Tell me about Sakura AI Engine"}],
)
print(response.choices[0].message.content)

It can also be used with Coding Intelligence and MCP servers.

Who it is for

It is a realistic option for companies and local governments that cannot send data to overseas clouds. Individual developers can also try it for free for prototypes and side projects.

The main drawback is that its performance is not always on par with GPT-4o or Claude Sonnet, but there is currently no other combination of domestic hosting, low cost, and OpenAI compatibility.

Kimi-K2.5 limitations

The public Sakura API does not include Kimi’s web-search plugin, so it cannot answer beyond its knowledge cutoff unless you add your own RAG data.