NVIDIA Nemotron 2 Nano 9B Japanese - The No.1 Japanese Model Under 10B for Sovereign AI

NVIDIA has released a small Japanese-focused language model, “Nemotron-Nano-9B-v2-Japanese”. On the Nejumi Leaderboard 4 (a major benchmark for Japanese LLMs), it ranks first in the under-10B parameter category.

Despite its compact 9B parameter size, the model delivers top-tier Japanese performance. It can run on edge GPUs and is clearly designed with on-premises enterprise use in mind.

Architecture and Technical Specs

The base model is Nemotron-Nano-9B-v2, which adopts a hybrid Transformer + Mamba architecture.

Item	Spec
Parameter count	9B (9 billion)
Throughput	Up to 6× compared to open-source alternatives
Inference	Edge GPU support
Tool calling	Optimized for structured data generation
Fine-tuning	Feasible on modest compute

The reported throughput—up to 6× over similarly sized open-source models—likely benefits significantly from the Mamba architecture. You should particularly see gains in long-context inference efficiency.

Japanese Training Data

The following corpora are used for continued pretraining.

Open-source Japanese corpora

Wikipedia (Japanese edition)
fineweb-2 Japanese
Aozora Bunko (Aozorabunko)
SIP3-ja-general-web-corpus

NVIDIA datasets

Nemotron-CC-v2.1
Nemotron-Pretraining-Specialized-v1

Assets from LLM-jp, Japan’s open-source LLM community, are also leveraged. It’s interesting that Aozora Bunko is included: because it contains classical Japanese, the model can potentially handle a broader range of vocabulary and styles, not just modern Japanese.

Supervised Fine-tuning (SFT)

For SFT, the dataset “Nemotron-Personas-Japan” is used. It is built from 6 million personas and reflects real-world Japanese demographic data (geographic distribution and diversity of personality traits). It’s released under the CC BY 4.0 license and is valuable in its own right.

The tool-calling dataset is also generated with these personas as seeds, revealing a design philosophy that assumes agentic use.

Benchmark Results

Nejumi Leaderboard 4 evaluates across roughly 40 benchmarks.

Basic language ability: Japanese understanding and generation
Agentic ability: code generation, mathematical reasoning, tool use
Alignment: instruction following, bias, toxicity, truthfulness, robustness

It outperforms similarly sized models such as Qwen3-8B. Notably, it shows strengths in tool calling and coding.

Toolchain Used for Development

Megatron-LM: 継続事前学習およびSFT
NeMo Curator: データ前処理とフィルタリング
NeMo Framework: カスタマイズ（Megatron-Bridge、AutoModel、NeMo-RL）

The model is developed end to end with NVIDIA’s in-house toolchain. Because the same tools can be used for fine-tuning, the barrier is low for enterprises to adapt it for domain-specific use.

Intended Use Cases

Customer-support agents
Internal automation tools
Domain-specialized assistants
Prototyping multi-agent systems

Although it’s only 9B, it exhibits strong agentic capabilities, making it well-suited as a node in a multi-agent setup. You can compose complex workflows without the overhead of a large model.

No.1 in Japanese under 10B, runs on a single RTX 4090, and strong at tool calling. A very attractive local model for agent workflows. It’s also notable how prominently NVIDIA is foregrounding “Sovereign AI”.

NVIDIA Nemotron 2 Nano 9B Japanese - Hugging Face Blog