Sakura AI Engine lets you use a free LLM API 3,000 times a month
Contents
Sakura AI Engine, provided by Sakura Internet, is an LLM inference API platform hosted entirely in Japan. It is compatible with the OpenAI API and can be used free of charge for up to 3,000 requests per month. In March 2026, Moonshot AI’s Kimi-K2.5 model also became available in public preview.
What Sakura AI Engine is
The service was generally available in September 2025. You can do LLM inference and RAG with only API calls.
| Feature | Details |
|---|---|
| OpenAI API compatible | Existing OpenAI SDKs and tools can use it by changing the endpoint |
| Hosted in Japan | All data processing stays on Japanese servers, and customer data is not used for training |
| Works with closed networks | Also supports VPN, LGWAN, and private networks |
| Free tier | Text generation up to 3,000 requests/month, transcription up to 50 requests/month, embeddings up to 10,000 requests/month |
For companies that cannot send data overseas, it is a realistic alternative to the OpenAI and Claude APIs.
Available models
Chat Completions
| Model | Provider | Input | Output | Notes |
|---|---|---|---|---|
| gpt-oss-120b | OpenAI | 0.15 yen / 10,000 tokens | 0.75 yen / 10,000 tokens | Free-tier target |
| Qwen3-Coder-480B-A35B-Instruct-FP8 | Alibaba Cloud | 0.3 yen / 10,000 tokens | 2.5 yen / 10,000 tokens | Coding specialized |
| Qwen3-Coder-30B-A3B-Instruct | Alibaba Cloud | 0.15 yen / 10,000 tokens | 0.75 yen / 10,000 tokens | Lightweight version |
| llm-jp-3.1-8x13b-instruct4 | LLM-jp | 0.15 yen / 10,000 tokens | 0.75 yen / 10,000 tokens | Domestic MoE model |
| PLaMo 2.0-31B | Preferred Networks | Contact sales | Contact sales | Domestic production |
| cotomi v3 | NEC | Contact sales | Contact sales | Domestic production |
Public preview
| Model | Provider | Input | Output |
|---|---|---|---|
| preview/Kimi-K2.5 | Moonshot AI | 0.6 yen / 10,000 tokens | 3.0 yen / 10,000 tokens |
| preview/Qwen3-VL-30B-A3B-Instruct | Alibaba Cloud | - | - |
| preview/Phi-4-multimodal-instruct | Microsoft | - | - |
Other services
| Service | Model | Price | Free tier |
|---|---|---|---|
| Audio transcription | whisper-large-v3-turbo | 0.5 yen / 60 seconds | 50 requests/month |
| Embeddings | multilingual-e5-large | 2 yen / 10,000 tokens | 10,000 requests/month |
| Voice synthesis | VOICEVOX | 3 yen / 10,000 mora | 50 requests/month |
| RAG | - | 3 yen / 100 chunks | - |
Pricing plans
There are two plans:
Base-model free plan
A plan that only uses the free tier. If you go over the limit, requests are delayed or rejected. A credit card is required, but you will not be charged if you stay within the free tier.
Pay-as-you-go plan
You pay for usage beyond the free limit. One reported example says gpt-oss-120b costs about 138 yen for 110 requests, 1.6 million input tokens, and 140,000 output tokens, which is very cheap for personal development.
Kimi-K2.5 in public preview
On March 17, 2026, Moonshot AI’s Kimi-K2.5 was added to Sakura AI Engine.
Specs
| Item | Value |
|---|---|
| Total parameters | 1T |
| Active parameters | about 32B |
| Architecture | Mixture-of-Experts |
| Experts | 384 total, 8 selected per token plus 1 shared |
| Layers | 61 |
| Attention hidden dimension | 7,168 |
| MoE hidden dimension per expert | 2,048 |
| Attention heads | 64 |
| Attention mechanism | MLA |
| Vision encoder | MoonViT |
| Training data | about 15T tokens |
| Vocabulary | 160,000 |
| Activation | SwiGLU |
| Knowledge cutoff | Based on April 2024, with partial coverage into October |
The MoE design uses only part of the full parameter set at inference time, so it has 1T parameters of knowledge but a 32B-class compute cost.
What you can do
- Document understanding
- Code generation
- Image captioning
- Multimodal Q&A
Because it is only a public preview, stability and quality are not guaranteed.
How to start using it
graph TD
A[Sakura Internet account] --> B[Sakura Cloud project]
B --> C[Credit card registration]
C --> D[Enable AI Engine from the control panel]
D --> E[Issue API token]
E --> F[Send API requests]
The API is OpenAI-compatible, so existing OpenAI SDKs work after changing the base URL.
from openai import OpenAI
client = OpenAI(
base_url="https://ai-engine.sakura.ad.jp/v1",
api_key="YOUR_API_TOKEN",
)
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role": "user", "content": "Tell me about Sakura AI Engine"}],
)
print(response.choices[0].message.content)
It can also be used with Coding Intelligence and MCP servers.
Who it is for
It is a realistic option for companies and local governments that cannot send data to overseas clouds. Individual developers can also try it for free for prototypes and side projects.
The main drawback is that its performance is not always on par with GPT-4o or Claude Sonnet, but there is currently no other combination of domestic hosting, low cost, and OpenAI compatibility.
Kimi-K2.5 limitations
The public Sakura API does not include Kimi’s web-search plugin, so it cannot answer beyond its knowledge cutoff unless you add your own RAG data.