Tech 3 min read

NVIDIA Nemotron 2 Nano 9B Japanese - The No.1 Japanese Model Under 10B for Sovereign AI

NVIDIA has released a small Japanese-focused language model, “Nemotron-Nano-9B-v2-Japanese”. On the Nejumi Leaderboard 4 (a major benchmark for Japanese LLMs), it ranks first in the under-10B parameter category.

Despite its compact 9B parameter size, the model delivers top-tier Japanese performance. It can run on edge GPUs and is clearly designed with on-premises enterprise use in mind.

Architecture and Technical Specs

The base model is Nemotron-Nano-9B-v2, which adopts a hybrid Transformer + Mamba architecture.

ItemSpec
Parameter count9B (9 billion)
ThroughputUp to 6× compared to open-source alternatives
InferenceEdge GPU support
Tool callingOptimized for structured data generation
Fine-tuningFeasible on modest compute

The reported throughput—up to 6× over similarly sized open-source models—likely benefits significantly from the Mamba architecture. You should particularly see gains in long-context inference efficiency.

Japanese Training Data

The following corpora are used for continued pretraining.

Open-source Japanese corpora

  • Wikipedia (Japanese edition)
  • fineweb-2 Japanese
  • Aozora Bunko (Aozorabunko)
  • SIP3-ja-general-web-corpus

NVIDIA datasets

  • Nemotron-CC-v2.1
  • Nemotron-Pretraining-Specialized-v1

Assets from LLM-jp, Japan’s open-source LLM community, are also leveraged. It’s interesting that Aozora Bunko is included: because it contains classical Japanese, the model can potentially handle a broader range of vocabulary and styles, not just modern Japanese.

Supervised Fine-tuning (SFT)

For SFT, the dataset “Nemotron-Personas-Japan” is used. It is built from 6 million personas and reflects real-world Japanese demographic data (geographic distribution and diversity of personality traits). It’s released under the CC BY 4.0 license and is valuable in its own right.

The tool-calling dataset is also generated with these personas as seeds, revealing a design philosophy that assumes agentic use.

Benchmark Results

Nejumi Leaderboard 4 evaluates across roughly 40 benchmarks.

  • Basic language ability: Japanese understanding and generation
  • Agentic ability: code generation, mathematical reasoning, tool use
  • Alignment: instruction following, bias, toxicity, truthfulness, robustness

It outperforms similarly sized models such as Qwen3-8B. Notably, it shows strengths in tool calling and coding.

Toolchain Used for Development

Megatron-LM: 継続事前学習およびSFT
NeMo Curator: データ前処理とフィルタリング
NeMo Framework: カスタマイズ(Megatron-Bridge、AutoModel、NeMo-RL)

The model is developed end to end with NVIDIA’s in-house toolchain. Because the same tools can be used for fine-tuning, the barrier is low for enterprises to adapt it for domain-specific use.

Intended Use Cases

  • Customer-support agents
  • Internal automation tools
  • Domain-specialized assistants
  • Prototyping multi-agent systems

Although it’s only 9B, it exhibits strong agentic capabilities, making it well-suited as a node in a multi-agent setup. You can compose complex workflows without the overhead of a large model.


No.1 in Japanese under 10B, runs on a single RTX 4090, and strong at tool calling. A very attractive local model for agent workflows. It’s also notable how prominently NVIDIA is foregrounding “Sovereign AI”.

NVIDIA Nemotron 2 Nano 9B Japanese - Hugging Face Blog