Nvidia's 9B parameter AI model offers toggleable reasoning on single GPU

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Nvidia has released Nemotron-Nano-9B-v2, a compact 9-billion parameter language model that features toggleable AI reasoning capabilities and achieves top performance in its class on key benchmarks. The model represents Nvidia’s entry into the competitive small language model market, offering enterprises a balance between computational efficiency and advanced reasoning capabilities that can run on a single GPU.

What you should know: Nemotron-Nano-9B-v2 combines hybrid architecture with user-controllable reasoning to deliver enterprise-ready AI at reduced computational costs.

The model was pruned from 12 billion to 9 billion parameters specifically to fit on a single Nvidia A10 GPU, making deployment more accessible for enterprises.
Users can toggle reasoning on or off using simple control tokens like /think or /no_think, allowing developers to balance accuracy with response speed.
Runtime “thinking budget” management lets developers cap the number of tokens devoted to internal reasoning, optimizing for specific use cases like customer support or autonomous agents.

Technical architecture: The model uses a fusion of Transformer and Mamba architectures to achieve superior efficiency on long-context tasks.

Unlike pure Transformer models that rely entirely on attention layers, Nemotron-Nano-9B-v2 incorporates selective state space models (SSMs) that scale linearly with sequence length.
This hybrid approach delivers 2–3× higher throughput on long contexts while maintaining comparable accuracy to traditional models.
As Oleksii Kuchiaev, Nvidia Director of AI Model Post-Training, explained: “It is also a hybrid model which allows it to process a larger batch size and be up to 6x faster than similar sized transformer models.”

In plain English: Most AI models process information using “attention layers” that examine every piece of text in relation to every other piece—like reading a book while constantly cross-referencing every sentence with every other sentence. This becomes computationally expensive with longer texts. Nemotron-Nano-9B-v2 uses a hybrid approach that combines these attention layers with “state space models”—think of them as a more efficient way to maintain context that scales better with longer documents, much like how a skilled reader can follow a story’s plot without re-reading every previous page.

Performance benchmarks: The model demonstrates competitive accuracy across multiple evaluation metrics when tested in “reasoning on” mode.

Nemotron-Nano-9B-v2 achieved 72.1% on AIME25, 97.8% on MATH500, 64.0% on GPQA, and 71.1% on LiveCodeBench.
Instruction following and long-context performance reached 90.3% on IFEval and 78.9% on the RULER 128K test.
Across all benchmarks, the model outperformed Qwen3-8B, a common comparison point in the small language model category.

Enterprise-friendly licensing: Nvidia released the model under a permissive commercial license designed for immediate production deployment.

The Nvidia Open Model License Agreement allows commercial use without usage fees, revenue thresholds, or user count restrictions.
Enterprises must maintain built-in safety guardrails, include proper attribution when redistributing, and comply with trade regulations and Nvidia’s Trustworthy AI guidelines.
Nvidia explicitly states it does not claim ownership of model outputs, leaving rights and responsibility with the deploying organization.

Multilingual capabilities: The model supports multiple languages including English, German, Spanish, French, Italian, Japanese, Korean, Portuguese, Russian, and Chinese, making it suitable for global enterprise deployments and both instruction following and code generation tasks.

Nvidia releases a new small, open model Nemotron-Nano-9B-v2 with toggle on/off reasoning

VentureBeat

Menu

Nvidia’s 9B parameter AI model offers toggleable reasoning on single GPU

Recent News

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

How to protect your portfolio from a potential AI bubble burst

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Nvidia’s 9B parameter AI model offers toggleable reasoning on single GPU

Recent News

Adnoc partners with US robotics startup to deploy AI across oil operations

6 places where Google’s Gemini AI should be but isn’t

How to protect your portfolio from a potential AI bubble burst

Join the revolution

CO/AI

Resources

Join the revolution