NVIDIA Nemotron 2 Nano 9B Japanese - The No.1 Japanese Model Under 10B for Sovereign AI
NVIDIA has released a small Japanese-focused language model, “Nemotron-Nano-9B-v2-Japanese”. On the Nejumi Leaderboard 4 (a major benchmark for Japanese LLMs), it ranks first in the under-10B parameter category.
Despite its compact 9B parameter size, the model delivers top-tier Japanese performance. It can run on edge GPUs and is clearly designed with on-premises enterprise use in mind.
Architecture and Technical Specs
The base model is Nemotron-Nano-9B-v2, which adopts a hybrid Transformer + Mamba architecture.
| Item | Spec |
|---|---|
| Parameter count | 9B (9 billion) |
| Throughput | Up to 6× compared to open-source alternatives |
| Inference | Edge GPU support |
| Tool calling | Optimized for structured data generation |
| Fine-tuning | Feasible on modest compute |
The reported throughput—up to 6× over similarly sized open-source models—likely benefits significantly from the Mamba architecture. You should particularly see gains in long-context inference efficiency.
Japanese Training Data
The following corpora are used for continued pretraining.
Open-source Japanese corpora
- Wikipedia (Japanese edition)
- fineweb-2 Japanese
- Aozora Bunko (Aozorabunko)
- SIP3-ja-general-web-corpus
NVIDIA datasets
- Nemotron-CC-v2.1
- Nemotron-Pretraining-Specialized-v1
Assets from LLM-jp, Japan’s open-source LLM community, are also leveraged. It’s interesting that Aozora Bunko is included: because it contains classical Japanese, the model can potentially handle a broader range of vocabulary and styles, not just modern Japanese.
Supervised Fine-tuning (SFT)
For SFT, the dataset “Nemotron-Personas-Japan” is used. It is built from 6 million personas and reflects real-world Japanese demographic data (geographic distribution and diversity of personality traits). It’s released under the CC BY 4.0 license and is valuable in its own right.
The tool-calling dataset is also generated with these personas as seeds, revealing a design philosophy that assumes agentic use.
Benchmark Results
Nejumi Leaderboard 4 evaluates across roughly 40 benchmarks.
- Basic language ability: Japanese understanding and generation
- Agentic ability: code generation, mathematical reasoning, tool use
- Alignment: instruction following, bias, toxicity, truthfulness, robustness
It outperforms similarly sized models such as Qwen3-8B. Notably, it shows strengths in tool calling and coding.
Toolchain Used for Development
Megatron-LM: 継続事前学習およびSFT
NeMo Curator: データ前処理とフィルタリング
NeMo Framework: カスタマイズ(Megatron-Bridge、AutoModel、NeMo-RL)
The model is developed end to end with NVIDIA’s in-house toolchain. Because the same tools can be used for fine-tuning, the barrier is low for enterprises to adapt it for domain-specific use.
Intended Use Cases
- Customer-support agents
- Internal automation tools
- Domain-specialized assistants
- Prototyping multi-agent systems
Although it’s only 9B, it exhibits strong agentic capabilities, making it well-suited as a node in a multi-agent setup. You can compose complex workflows without the overhead of a large model.
No.1 in Japanese under 10B, runs on a single RTX 4090, and strong at tool calling. A very attractive local model for agent workflows. It’s also notable how prominently NVIDIA is foregrounding “Sovereign AI”.