In the latest episode of the “All-In Podcast,” the four hosts revealed that OpenAI has reached a significant partnership agreement with the U.S. AI chip startup Cerebras. Over the next three years, they will purchase up to 750 gigawatts of computing power from Cerebras, with the overall scale reaching the billion-dollar range. The focus of this deal is not on model training but on a critical aspect of AI commercialization: (Inference), which is the computational power needed for models to provide real-time responses and services externally. The hosts pointed out that this symbolizes a shift in AI industry competition from “who trains the largest models” to “who can provide the fastest, most stable, and most cost-efficient inference capabilities.”
Three-year 750 GW Power Contract, OpenAI’s Inference Strategy
The program mentioned that OpenAI has committed to purchasing up to 750 gigawatts of computing resources from Cerebras within three years, equivalent to the capacity of a large power plant, specifically supporting the operation of AI systems in practical applications.
The hosts emphasized that this partnership is not aimed at training new models but at supporting the real-time inference needs of ChatGPT, API services, and various AI applications. As the number of users and application scenarios rapidly expands, the computational power required for inference will far surpass that needed during the training phase.
Cerebras’ Unique Chip Design: Entire Wafers as Single Chips
The program’s guest explained that Cerebras has chosen a completely different route from mainstream chip design since its inception.
Typically, chip manufacturing involves cutting many small chips from a single wafer and packaging them separately; Cerebras, on the other hand, designs an entire wafer as a single, ultra-large chip, integrating numerous computing units and memory on it.
This design significantly shortens the physical distance between computation and memory, reducing data transfer between multiple chips, lowering system complexity, and helping to improve overall computational efficiency.
Cerebras Focuses on Low Latency, Prioritizing Speed in Inference Scenarios
The program pointed out that the key indicators during inference are response speed, latency performance, and system stability. After a user submits a command, the model must complete calculations and return results immediately; any delay directly impacts user experience.
Because Cerebras concentrates a large amount of computing power and memory on a single ultra-large chip, data flows over very short distances, making it especially suitable for inference scenarios that require high frequency and low latency responses.
The program mentioned that the earliest large-scale deployment of Cerebras systems was by institutions in the Middle East, including relevant agencies in the United Arab Emirates. After actual deployment, these systems demonstrated significant speed advantages in specific inference tasks.
OpenAI Diversifies Supply Chain to Avoid Single Vendor Risks
The hosts pointed out that in recent years, OpenAI has explicitly adopted a “multi-vendor strategy,” no longer relying solely on a single chip manufacturer.
Currently, OpenAI extensively uses NVIDIA’s computing power, collaborates with AMD, and now introduces Cerebras, forming multiple supply routes for computational power. The goal is to diversify risks and prevent disruptions caused by capacity, pricing, or policy changes from a single supplier.
The program described this as a “decentralization” of the AI compute supply chain, ensuring that even if one route encounters issues, services can continue uninterrupted.
Silicon Chip Industry Reshuffle: Opportunities for New Startups
The hosts believe that this partnership is not only a strategic adjustment for OpenAI but also a sign that the AI chip industry is entering a new round of competition.
In the next 10 to 20 years, a scenario similar to the early days of the personal computer industry, with many companies vying for dominance, may repeat itself. Not only large chip companies but also many startups could find their footing in inference chips, dedicated computing power, and vertical application chips.
With the continuous expansion of AI commercialization demands, as long as the technological direction is correct, small startups still have opportunities to rapidly rise within the industry.
Industry Shift: From Training Race to Inference Competition
The program concluded that in the early stages of AI development, the market focused on who could train the largest and most powerful models; but as applications become practical, the real determinant of success will be who can provide inference services faster, cheaper, and more reliably.
The collaboration between OpenAI and Cerebras is seen as an important indicator of this industry shift, demonstrating that AI competition is gradually moving from “training scale” to “inference efficiency.”
(“All-In Podcast” predicts 2026: AI, Copper, and Electricity as the New Global Economic Pillars)
This article, “All-In Podcast” on the shift of AI battlegrounds toward inference, the restart of silicon chip wars, and opportunities for startups, first appeared on Chain News ABMedia.