Nvidia Opens Nemotron 3 Ultra, 550B-Parameter Flagship Model With Mamba-Transformer Hybrid MoE Architecture

According to Beating, Nvidia released its flagship large language model Nemotron 3 Ultra on June 4, featuring 550 billion total parameters with 55 billion active parameters. The model scores 48 on Artificial Analysis' intelligence index, making it the strongest performing open-source U.S. model, second only to Kimi K2.6 at 54 points.

The model employs a hybrid Mamba-Transformer MoE architecture that alternates Mamba-2 state space layers with Transformer attention layers, supporting a 1 million token context window while avoiding quadratic KV cache growth. Compared to dense models of similar scale, the hybrid architecture achieves 5x higher throughput and 30% lower inference costs on agent tasks. Nemotron 3 Ultra is available on Hugging Face, NVIDIA NIM, and OpenRouter.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments