Japanese LLMs Have Multiplied — Here's What's Actually Inside Them
Contents
Since the start of 2026, there’s been a surge of LLMs claiming to be good at Japanese.
But “Japanese-specialized” can mean wildly different things. Some were trained from scratch; others just had Japanese bolted on after the fact.
New fiscal year in Japan, good time to take stock.
Three Flavors of “Japanese-Specialized”
Japanese-capable LLMs fall into three broad categories based on how they were trained.
| Approach | What it means | Examples |
|---|---|---|
| Scratch training | Borrow only the architecture, train weights from zero | LLM-jp-4, PLaMo, cotomi |
| Continued pre-training | Take existing model weights, train more on Japanese corpora | Nemotron Nano 9B JP, Swallow, Rakuten AI 3.0 |
| Post-training | Adjust behavior with SFT/RLHF on an existing model | Namazu |
LLM-jp-4 trained 11.7 trillion tokens from scratch. Namazu applied post-training to DeepSeek’s weights. Both call themselves Japanese LLMs, but the development cost and model characteristics are completely different.
This isn’t about scratch being better — it’s about different goals.
Overview
Major Japanese LLMs available as of April 2026.
| Model | Developer | Approach | Size | Benchmark | License |
|---|---|---|---|---|---|
| LLM-jp-4 | NII | Scratch | 32B MoE (3.8B active) | MT-Bench JA 7.82 | Apache 2.0 |
| LFM2.5-JP | Liquid AI | Scratch | 1.2B | JMMLU 50.7 | LFM Open License |
| PLaMo 2.0 | PFN | Scratch | 31B | Undisclosed | Undisclosed |
| cotomi v3 | NEC | Scratch | Undisclosed | Undisclosed | Undisclosed |
| LLM-jp-3.1 | LLM-jp Consortium | Scratch | MoE (8x13B) | Undisclosed | TBD |
| Nemotron Nano 9B JP | NVIDIA | Continued pre-training | 9B | #1 in sub-10B on Nejumi 4 | NVIDIA Open Model |
| Swallow 30B-A3B | Tokyo Tech / AIST | Continued pre-training + RL | 30B MoE (3B active) | — | TBD |
| Rakuten AI 3.0 | Rakuten | Continued pre-training | Undisclosed | Undisclosed | Undisclosed |
| Namazu | Sakana AI | Post-training | Multiple sizes | — | Depends on base model |
Scratch-Trained Models
LLM-jp-4-32B-A3B (NII)
Japan’s National Institute of Informatics trained 11.7 trillion tokens from zero.
The architecture is based on Qwen3MoE, but the weights are entirely new. No synthetic data from GPT or Claude was used.
Japanese makes up only 3.5% of the corpus, but was 4.5x oversampled during training to reach 15.9%.
The result: MT-Bench JA 7.82, beating GPT-4o’s 7.29.
On my EVO-X2 (Strix Halo), it hit 62.9 t/s — 41% faster than Qwen3.5-35B-A3B’s 44.7 t/s. Having half the experts (128 vs 256) and fewer layers (32 vs 40) helps.
It’s a thinking model, so creative prompts can exhaust the thinking budget before generating any content. --reasoning-budget control is essential.
Safety filters are also very strict, and no abliterated version exists.
A 32B Dense model and a 332B-A31B MoE (332 billion parameters, 31 billion active) are planned for release within fiscal year 2026.
LFM2.5-1.2B-JP (Liquid AI)
At just 1.2B parameters, it scores JMMLU 50.7 and M-IFEval (JA) 58.1, beating Qwen3-1.7B on all Japanese benchmarks.
A Convolution + Attention hybrid architecture — no SSM — that runs roughly 2x faster than transformers on CPU/edge devices.
Best option in this size class for running Japanese LLMs on edge devices.
PLaMo 2.0, cotomi v3, LLM-jp-3.1
PLaMo (Preferred Networks), cotomi (NEC), and LLM-jp-3.1 (LLM-jp Consortium) are all domestically scratch-trained models.
All three are available via Sakura Internet’s “Sakura AI Engine” API.
PLaMo and cotomi require individual pricing inquiries.
LLM-jp-3.1 costs ¥0.15/10K input tokens and ¥0.75/10K output tokens, with a free tier of 3,000 requests/month.
Continued Pre-Training Models
Nemotron Nano 9B Japanese (NVIDIA)
NVIDIA’s “sovereign AI” play for Japan. A 9B model that ranked #1 in the sub-10B category on Nejumi Leaderboard 4.
Transformer-Mamba hybrid architecture delivers up to 6x throughput compared to same-size open-source models.
Training data includes Japanese Wikipedia, Aozora Bunko (public domain literature), and SIP3 corpus, plus NVIDIA’s own Nemotron datasets.
SFT used a dataset built from 6 million personas based on Japanese demographic data.
At 9B, it runs on a single edge GPU. Well-suited for on-premises enterprise use.
Particularly strong at tool calling and coding.
Qwen3-Swallow 30B-A3B (Tokyo Tech / AIST)
Qwen3 with continued pre-training and RL for Japanese.
In NDLOCR-Lite OCR correction testing, vocabulary fixes (“一方交通→一方通行”, “受けー方→受け側”) were more natural than Qwen3.5.
GGUF versions have issues with thinking control — be cautious when running locally.
→ OCR correction comparison article
Rakuten AI 3.0
Announced as “Japan’s largest-scale high-performance AI model” under the GENIAC government subsidy program, but "model_type": "deepseek_v3" was found in config.json right after release, revealing it as a DeepSeek-V3 base.
The initial release had DeepSeek’s MIT license file removed. It was added back after community backlash.
Using DeepSeek-V3 is fine — it’s MIT licensed.
But concealing the base model while presenting it as a domestically-developed model funded by government subsidies is worth knowing about.
Post-Training Models
Namazu (Sakana AI)
Post-training applied to existing models like DeepSeek-V3.1-Terminus and Llama 3.1 405B.
Primary goal is correcting biases related to Japanese politics and history — a different aim from the other models here.
The weights are borrowed, but applying targeted bias corrections to already-capable models is a pragmatic approach.
Fun fact: “Namazu” collides with a full-text search engine from 1997.
→ Name collision article
API Options
If running locally is too much hassle, Sakura Internet’s “Sakura AI Engine” offers a domestic alternative.
All processing stays in Japanese data centers. OpenAI API compatible.
| Model | Input (per 10K tokens) | Output (per 10K tokens) | Free tier |
|---|---|---|---|
| LLM-jp-3.1 8x13B | ¥0.15 | ¥0.75 | Yes |
| PLaMo 2.0-31B | Contact sales | Contact sales | — |
| cotomi v3 | Contact sales | Contact sales | — |
A realistic alternative to OpenAI API or Claude API for projects where data cannot leave Japan (government, financial institutions, etc.).
Choosing by Use Case
| Use case | Pick | Why |
|---|---|---|
| Local, Japanese quality first | LLM-jp-4 | MT-Bench JA 7.82, 62 t/s |
| Edge / 9B size | Nemotron Nano 9B JP | #1 in sub-10B, strong tool calling |
| As small as possible | LFM2.5-1.2B-JP | 1.2B, runs on CPU |
| API, data stays in Japan | Sakura AI Engine | LLM-jp-3.1 has free tier |
| Japanese OCR correction | Swallow | Natural vocabulary fixes |