According to Beating, Nvidia released its flagship large language model Nemotron 3 Ultra on June 4, featuring 550 billion total parameters with 55 billion active parameters. The model scores 48 on Artificial Analysis' intelligence index, making it the strongest performing open-source U.S. model, second only to Kimi K2.6 at 54 points.
The model employs a hybrid Mamba-Transformer MoE architecture that alternates Mamba-2 state space layers with Transformer attention layers, supporting a 1 million token context window while avoiding quadratic KV cache growth. Compared to dense models of similar scale, the hybrid architecture achieves 5x higher throughput and 30% lower inference costs on agent tasks. Nemotron 3 Ultra is available on Hugging Face, NVIDIA NIM, and OpenRouter.