9 Japanese LLMs in April 2026 compared: LLM-jp-4, PLaMo, Nemotron Nano 9B JP, Swallow, Namazu

Nine “Japanese-capable” LLMs are surveyed here as of April 2026 — LLM-jp-4, LFM2.5-JP, PLaMo, cotomi, Nemotron Nano 9B JP, Swallow, Rakuten AI 3.0, Namazu, and others. The point of this post is that “Japanese-specialized” means wildly different things depending on whether the model was trained from scratch, continued pre-trained on Japanese corpora, or just post-trained (SFT/RLHF) on top of an existing model. Development cost, model behavior, and where each one is actually useful diverge sharply once you know which bucket it falls into.

Three Flavors of “Japanese-Specialized”

Japanese-capable LLMs fall into three broad categories based on how they were trained.

Approach	What it means	Examples
Scratch training	Borrow only the architecture, train weights from zero	LLM-jp-4, PLaMo, cotomi
Continued pre-training	Take existing model weights, train more on Japanese corpora	Nemotron Nano 9B JP, Swallow, Rakuten AI 3.0
Post-training	Adjust behavior with SFT/RLHF on an existing model	Namazu

LLM-jp-4 trained 11.7 trillion tokens from scratch. Namazu applied post-training to DeepSeek’s weights. Both call themselves Japanese LLMs, but the development cost and model characteristics are completely different.
This isn’t about scratch being better — it’s about different goals.

Overview

Major Japanese LLMs available as of April 2026.

Model	Developer	Approach	Size	Benchmark	License
LLM-jp-4	NII	Scratch	32B MoE (3.8B active)	MT-Bench JA 7.82	Apache 2.0
LFM2.5-JP	Liquid AI	Scratch	1.2B	JMMLU 50.7	LFM Open License
PLaMo 2.0	PFN	Scratch	31B	Undisclosed	Undisclosed
cotomi v3	NEC	Scratch	Undisclosed	Undisclosed	Undisclosed
LLM-jp-3.1	LLM-jp Consortium	Scratch	MoE (8x13B)	Undisclosed	TBD
Nemotron Nano 9B JP	NVIDIA	Continued pre-training	9B	#1 in sub-10B on Nejumi 4	NVIDIA Open Model
Swallow 30B-A3B	Tokyo Tech / AIST	Continued pre-training + RL	30B MoE (3B active)	—	TBD
Rakuten AI 3.0	Rakuten	Continued pre-training	Undisclosed	Undisclosed	Undisclosed
Namazu	Sakana AI	Post-training	Multiple sizes	—	Depends on base model

Scratch-Trained Models

LLM-jp-4-32B-A3B (NII)

Japan’s National Institute of Informatics trained 11.7 trillion tokens from zero.
The architecture is based on Qwen3MoE, but the weights are entirely new. No synthetic data from GPT or Claude was used.

Japanese makes up only 3.5% of the corpus, but was 4.5x oversampled during training to reach 15.9%.
The result: MT-Bench JA 7.82, beating GPT-4o’s 7.29.

On my EVO-X2 (Strix Halo), it hit 62.9 t/s — 41% faster than Qwen3.5-35B-A3B’s 44.7 t/s. Having half the experts (128 vs 256) and fewer layers (32 vs 40) helps.

It’s a thinking model, so creative prompts can exhaust the thinking budget before generating any content. --reasoning-budget control is essential.
Safety filters are also very strict, and no abliterated version exists.

A 32B Dense model and a 332B-A31B MoE (332 billion parameters, 31 billion active) are planned for release within fiscal year 2026.

→ Benchmark article

LFM2.5-1.2B-JP (Liquid AI)

At just 1.2B parameters, it scores JMMLU 50.7 and M-IFEval (JA) 58.1, beating Qwen3-1.7B on all Japanese benchmarks.
A Convolution + Attention hybrid architecture — no SSM — that runs roughly 2x faster than transformers on CPU/edge devices.

Best option in this size class for running Japanese LLMs on edge devices.

→ Architecture deep-dive

PLaMo 2.0, cotomi v3, LLM-jp-3.1

PLaMo (Preferred Networks), cotomi (NEC), and LLM-jp-3.1 (LLM-jp Consortium) are all domestically scratch-trained models.
All three are available via Sakura Internet’s “Sakura AI Engine” API.

PLaMo and cotomi require individual pricing inquiries.
LLM-jp-3.1 costs ¥0.15/10K input tokens and ¥0.75/10K output tokens, with a free tier of 3,000 requests/month.

→ Sakura AI Engine article

Continued Pre-Training Models

Nemotron Nano 9B Japanese (NVIDIA)

NVIDIA’s “sovereign AI” play for Japan. A 9B model that ranked #1 in the sub-10B category on Nejumi Leaderboard 4.

Transformer-Mamba hybrid architecture delivers up to 6x throughput compared to same-size open-source models.
Training data includes Japanese Wikipedia, Aozora Bunko (public domain literature), and SIP3 corpus, plus NVIDIA’s own Nemotron datasets.
SFT used a dataset built from 6 million personas based on Japanese demographic data.

At 9B, it runs on a single edge GPU. Well-suited for on-premises enterprise use.
Particularly strong at tool calling and coding.

→ Detailed article

Qwen3-Swallow 30B-A3B (Tokyo Tech / AIST)

Qwen3 with continued pre-training and RL for Japanese.
In NDLOCR-Lite OCR correction testing, vocabulary fixes (“一方交通→一方通行”, “受けー方→受け側”) were more natural than Qwen3.5.

GGUF versions have issues with thinking control — be cautious when running locally.

→ OCR correction comparison article

Rakuten AI 3.0

Announced as “Japan’s largest-scale high-performance AI model” under the GENIAC government subsidy program, but "model_type": "deepseek_v3" was found in config.json right after release, revealing it as a DeepSeek-V3 base.
The initial release had DeepSeek’s MIT license file removed. It was added back after community backlash.

Using DeepSeek-V3 is fine — it’s MIT licensed.
But concealing the base model while presenting it as a domestically-developed model funded by government subsidies is worth knowing about.

Post-Training Models

Namazu (Sakana AI)

Post-training applied to existing models like DeepSeek-V3.1-Terminus and Llama 3.1 405B.
Primary goal is correcting biases related to Japanese politics and history — a different aim from the other models here.

The weights are borrowed, but applying targeted bias corrections to already-capable models is a pragmatic approach.

Fun fact: “Namazu” collides with a full-text search engine from 1997.
→ Name collision article

API Options

If running locally is too much hassle, Sakura Internet’s “Sakura AI Engine” offers a domestic alternative.
All processing stays in Japanese data centers. OpenAI API compatible.

Model	Input (per 10K tokens)	Output (per 10K tokens)	Free tier
LLM-jp-3.1 8x13B	¥0.15	¥0.75	Yes
PLaMo 2.0-31B	Contact sales	Contact sales	—
cotomi v3	Contact sales	Contact sales	—

A realistic alternative to OpenAI API or Claude API for projects where data cannot leave Japan (government, financial institutions, etc.).

→ Sakura AI Engine article

Choosing by Use Case

Use case	Pick	Why
Local, Japanese quality first	LLM-jp-4	MT-Bench JA 7.82, 62 t/s
Edge / 9B size	Nemotron Nano 9B JP	#1 in sub-10B, strong tool calling
As small as possible	LFM2.5-1.2B-JP	1.2B, runs on CPU
API, data stays in Japan	Sakura AI Engine	LLM-jp-3.1 has free tier
Japanese OCR correction	Swallow	Natural vocabulary fixes

International Models with Japanese Support

Not Japanese-specialized, but increasingly viable for Japanese tasks.

Model	Release	Size	Japanese strengths
Qwen3.5	Feb–Mar 2026	0.8B–397B MoE	High Japanese token efficiency — same text uses fewer tokens. Top score (0.711) on Nejumi 4 in the sub-10B class
Gemma 4	Apr 2, 2026	E2B–31B Dense	140 languages, native multimodal (image/video/audio input), Apache 2.0
Llama 4 Scout/Maverick	Apr 5, 2026	MoE	Trained on 200 languages, Scout supports up to 10M token context

Qwen3.5 has been tested extensively in local environments.
→ Running a 397B model on a 48GB MacBook with Flash-MoE
→ KV cache and Vulkan optimization

Gemma 4’s model family is covered in a separate article.
→ Gemma 4: E2B to A4B in four sizes

For Vision LLM practicality, we tested local VLMs on image-to-structured-data extraction.
→ Extracting RPG parameters from character art with local Vision LLMs