Challenge NVIDIA's AI software hegemony! Modular creates a cross-hardware integrated AI platform to compete with CUDA.

robot
Abstract generation in progress

At this moment when generative AI is sweeping the globe, almost all LLMs, cloud services, and AI startups rely on the same critical infrastructure, which is the CUDA software ecosystem of NVIDIA (. This architecture, originally designed for graphics cards, has gradually expanded over the past two decades to become the “invisible operating system” that powers the actual operations of the AI industry.

Founded in 2022, the startup Modular is attempting to directly challenge this highly centralized structure dominated by a single supplier. Modular's goal is not to create another new chip, but to build a “portable AI software stack platform” that allows AI models to move freely between different GPUs and accelerators, no longer being locked into the ecosystem of NVIDIA and CUDA.

In 2022, the entrepreneurial starting point, where backend engineers directly venture into the CUDA core.

Modular was founded in 2022 by two software engineers from Apple and Google. CEO Chris Lattner created the Swift programming language and the LLVM compiler infrastructure, while co-founder Tim Davis was involved in the critical software architecture of Google TPU.

Both individuals have experienced firsthand how new hardware can “open up the market through software,” so they chose to leave large tech companies and directly challenge the AI software hegemony represented by CUDA. This asymmetric war seems almost insane in the eyes of the industry, but it is precisely because of their deep understanding of the underlying systems that they are considered one of the few teams with the opportunity to try.

CUDA is hard to replace, the structural shackles of the AI industry

CUDA was originally just a tool that allowed graphics cards to be programmable, but with the rise of deep learning, it has gradually expanded into a complete ecosystem that encompasses languages, libraries, compilers, and inference engines.

For most AI teams, it is almost impossible to avoid CUDA as long as they use NVIDIA GPUs. Even though there are AMD GPUs, TPUs, or self-developed chips from cloud service providers on the market, each type of hardware is still tied to proprietary software, leading developers to naturally prefer the most mature and fully equipped CUDA, resulting in a highly locked-in industry structure.

An engineering project without incentives has instead become a breakthrough.

Lattner pointed out that portable AI software that spans chips and vendors is not unimportant, but rather that “no one has enough incentive to bear the costs.” These types of projects are extremely difficult, have long payback periods, and show almost no commercial results in the short term, yet they are capabilities that the entire industry generally desires.

It is precisely this contradiction that led Modular to choose to invest in the long-term development of the underlying system before the explosion of generative AI, and to deliberately stay away from the market spotlight in the first three years after its establishment.

Three years of low-key cultivation, with funding and the team gradually in place.

As of 2025, Modular has raised approximately $380 million in funding, with investors including several leading venture capital firms from Silicon Valley. After completing its latest round of financing in September 2025, the company's valuation is around $1.6 billion.

These resources enable Modular to recruit senior engineers from Google and Apple, forming a team focused on compilers, system software, and AI infrastructure, continuously refining the complete software stack.

Three-tier software architecture, from language to computing cluster

The technical core of Modular consists of three layers, namely:

Top Layer: For Mammoth, used to assist enterprises in power scheduling and management in multi-GPU, multi-vendor environments, solving practical deployment and operation issues.

Middle layer: It is the MAX inference engine, responsible for the actual operation of the model, and supports NVIDIA, AMD, and Apple Silicon.

Lowest level: The Mojo programming language, with syntax close to Python and performance close to C++, can be integrated with mainstream AI frameworks.

2025 Key Verification, Unified Computing Layer Officially Takes Shape

In September 2025, Modular announced key test results, simultaneously driving NVIDIA's Blackwell B200 and AMD's MI355X on the same software platform to achieve top performance, with the MI355X performing approximately 50% better than AMD's native software.

Later, on 12/22, Modular Platform 25.6 was officially released, fully supporting data centers and consumer-grade GPUs, and for the first time allowing Mojo to directly support Apple Silicon. The official description describes it as “Write once, run anywhere,” which means:

“Developers can write code in Mojo without needing to create separate versions for NVIDIA, AMD, and Apple Silicon; the same code can run on different GPUs and hardware from different vendors.”

Symbolizing unity, the AI computing layer moves from concept to practical implementation.

This article challenges NVIDIA's AI software monopoly! Modular builds a cross-hardware integrated AI platform to confront CUDA, first appearing in Chain News ABMedia.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)