Tech 6 min read

Japanese LLMs Have Multiplied — Here's What's Actually Inside Them

IkesanContents

Since the start of 2026, there’s been a surge of LLMs claiming to be good at Japanese.
But “Japanese-specialized” can mean wildly different things. Some were trained from scratch; others just had Japanese bolted on after the fact.
New fiscal year in Japan, good time to take stock.

Three Flavors of “Japanese-Specialized”

Japanese-capable LLMs fall into three broad categories based on how they were trained.

ApproachWhat it meansExamples
Scratch trainingBorrow only the architecture, train weights from zeroLLM-jp-4, PLaMo, cotomi
Continued pre-trainingTake existing model weights, train more on Japanese corporaNemotron Nano 9B JP, Swallow, Rakuten AI 3.0
Post-trainingAdjust behavior with SFT/RLHF on an existing modelNamazu

LLM-jp-4 trained 11.7 trillion tokens from scratch. Namazu applied post-training to DeepSeek’s weights. Both call themselves Japanese LLMs, but the development cost and model characteristics are completely different.
This isn’t about scratch being better — it’s about different goals.

Overview

Major Japanese LLMs available as of April 2026.

ModelDeveloperApproachSizeBenchmarkLicense
LLM-jp-4NIIScratch32B MoE (3.8B active)MT-Bench JA 7.82Apache 2.0
LFM2.5-JPLiquid AIScratch1.2BJMMLU 50.7LFM Open License
PLaMo 2.0PFNScratch31BUndisclosedUndisclosed
cotomi v3NECScratchUndisclosedUndisclosedUndisclosed
LLM-jp-3.1LLM-jp ConsortiumScratchMoE (8x13B)UndisclosedTBD
Nemotron Nano 9B JPNVIDIAContinued pre-training9B#1 in sub-10B on Nejumi 4NVIDIA Open Model
Swallow 30B-A3BTokyo Tech / AISTContinued pre-training + RL30B MoE (3B active)TBD
Rakuten AI 3.0RakutenContinued pre-trainingUndisclosedUndisclosedUndisclosed
NamazuSakana AIPost-trainingMultiple sizesDepends on base model

Scratch-Trained Models

LLM-jp-4-32B-A3B (NII)

Japan’s National Institute of Informatics trained 11.7 trillion tokens from zero.
The architecture is based on Qwen3MoE, but the weights are entirely new. No synthetic data from GPT or Claude was used.

Japanese makes up only 3.5% of the corpus, but was 4.5x oversampled during training to reach 15.9%.
The result: MT-Bench JA 7.82, beating GPT-4o’s 7.29.

On my EVO-X2 (Strix Halo), it hit 62.9 t/s — 41% faster than Qwen3.5-35B-A3B’s 44.7 t/s. Having half the experts (128 vs 256) and fewer layers (32 vs 40) helps.

It’s a thinking model, so creative prompts can exhaust the thinking budget before generating any content. --reasoning-budget control is essential.
Safety filters are also very strict, and no abliterated version exists.

A 32B Dense model and a 332B-A31B MoE (332 billion parameters, 31 billion active) are planned for release within fiscal year 2026.

Benchmark article

LFM2.5-1.2B-JP (Liquid AI)

At just 1.2B parameters, it scores JMMLU 50.7 and M-IFEval (JA) 58.1, beating Qwen3-1.7B on all Japanese benchmarks.
A Convolution + Attention hybrid architecture — no SSM — that runs roughly 2x faster than transformers on CPU/edge devices.

Best option in this size class for running Japanese LLMs on edge devices.

Architecture deep-dive

PLaMo 2.0, cotomi v3, LLM-jp-3.1

PLaMo (Preferred Networks), cotomi (NEC), and LLM-jp-3.1 (LLM-jp Consortium) are all domestically scratch-trained models.
All three are available via Sakura Internet’s “Sakura AI Engine” API.

PLaMo and cotomi require individual pricing inquiries.
LLM-jp-3.1 costs ¥0.15/10K input tokens and ¥0.75/10K output tokens, with a free tier of 3,000 requests/month.

Sakura AI Engine article

Continued Pre-Training Models

Nemotron Nano 9B Japanese (NVIDIA)

NVIDIA’s “sovereign AI” play for Japan. A 9B model that ranked #1 in the sub-10B category on Nejumi Leaderboard 4.

Transformer-Mamba hybrid architecture delivers up to 6x throughput compared to same-size open-source models.
Training data includes Japanese Wikipedia, Aozora Bunko (public domain literature), and SIP3 corpus, plus NVIDIA’s own Nemotron datasets.
SFT used a dataset built from 6 million personas based on Japanese demographic data.

At 9B, it runs on a single edge GPU. Well-suited for on-premises enterprise use.
Particularly strong at tool calling and coding.

Detailed article

Qwen3-Swallow 30B-A3B (Tokyo Tech / AIST)

Qwen3 with continued pre-training and RL for Japanese.
In NDLOCR-Lite OCR correction testing, vocabulary fixes (“一方交通→一方通行”, “受けー方→受け側”) were more natural than Qwen3.5.

GGUF versions have issues with thinking control — be cautious when running locally.

OCR correction comparison article

Rakuten AI 3.0

Announced as “Japan’s largest-scale high-performance AI model” under the GENIAC government subsidy program, but "model_type": "deepseek_v3" was found in config.json right after release, revealing it as a DeepSeek-V3 base.
The initial release had DeepSeek’s MIT license file removed. It was added back after community backlash.

Using DeepSeek-V3 is fine — it’s MIT licensed.
But concealing the base model while presenting it as a domestically-developed model funded by government subsidies is worth knowing about.

Post-Training Models

Namazu (Sakana AI)

Post-training applied to existing models like DeepSeek-V3.1-Terminus and Llama 3.1 405B.
Primary goal is correcting biases related to Japanese politics and history — a different aim from the other models here.

The weights are borrowed, but applying targeted bias corrections to already-capable models is a pragmatic approach.

Fun fact: “Namazu” collides with a full-text search engine from 1997.
Name collision article

API Options

If running locally is too much hassle, Sakura Internet’s “Sakura AI Engine” offers a domestic alternative.
All processing stays in Japanese data centers. OpenAI API compatible.

ModelInput (per 10K tokens)Output (per 10K tokens)Free tier
LLM-jp-3.1 8x13B¥0.15¥0.75Yes
PLaMo 2.0-31BContact salesContact sales
cotomi v3Contact salesContact sales

A realistic alternative to OpenAI API or Claude API for projects where data cannot leave Japan (government, financial institutions, etc.).

Sakura AI Engine article

Choosing by Use Case

Use casePickWhy
Local, Japanese quality firstLLM-jp-4MT-Bench JA 7.82, 62 t/s
Edge / 9B sizeNemotron Nano 9B JP#1 in sub-10B, strong tool calling
As small as possibleLFM2.5-1.2B-JP1.2B, runs on CPU
API, data stays in JapanSakura AI EngineLLM-jp-3.1 has free tier
Japanese OCR correctionSwallowNatural vocabulary fixes