OpenAI Signs $10B Compute Deal With Cerebras

Key Takeaway

OpenAI has signed a multi-year deal valued at over $10 billion with Cerebras Systems to secure large-scale, low-latency compute capacity through 2028, underscoring both its acute infrastructure shortage and a deliberate strategy to diversify beyond Nvidia GPUs, with a specific focus on real-time inference performance.

OpenAI Signs Agreement With Cerebras – Key Points

Deal size, scope, and timeline
OpenAI will purchase up to 750 megawatts of computing capacity over three years, starting in 2026 and extending through 2028, in a contract valued at more than $10 billion. The capacity will come online in multiple tranches, allowing OpenAI to phase integration into its inference stack as demand grows. The systems will directly power ChatGPT and other latency-sensitive workloads, marking one of the largest single compute procurement deals in the AI sector to date.
Compute pressure driven by user scale and workload mix
OpenAI reports well over 900 million weekly users, with demand increasingly shifting toward real-time interactions, longer outputs, code generation, image creation, and AI agent workflows. Company leadership has repeatedly described a “severe shortage” of compute, and the Cerebras agreement specifically targets inference bottlenecks where response speed materially affects user engagement, session length, and workload value.
Wafer-scale chip architecture as a differentiator
Founded over a decade ago, Cerebras builds wafer-scale processors rather than assembling clusters of smaller chips. Its Wafer Scale Engine (WSE) uses an entire silicon wafer as a single processor. The current WSE-3, following WSE-1 (2020) and WSE-2 (~2021), integrates 900,000 AI cores and 4 trillion transistors, combining compute, memory, and bandwidth on one chip to minimize data movement and eliminate inter-GPU communication bottlenecks.
Inference-focused alternative to GPU clusters
Cerebras positions its CS-3 systems as purpose-built for low-latency, high-throughput inference, claiming that a single system can replace hundreds or thousands of GPUs in conventional clusters. OpenAI has stated that these systems are intended to speed up responses that currently take longer to process, improving perceived intelligence, interaction naturalness, and the feasibility of real-time AI applications at scale.
Cerebras business context, funding, and valuation
When Cerebras filed for an IPO in 2024, most of its revenue reportedly came from a single customer, G42. The IPO was subsequently postponed, and the company raised $1.1 billion privately instead. As of January 2026, Cerebras is reported to be in talks to raise another $1 billion at an estimated $22 billion valuation, while continuing to build multiple data centers across North America and Europe to support large inference deployments.
Strategic shift beyond Nvidia and portfolio diversification
The Cerebras partnership fits into OpenAI’s stated compute strategy of matching specific systems to specific workloads. Alongside this deal, OpenAI is collaborating with Broadcom on custom silicon, has signed long-term supply agreements for AMD MI450 accelerators, and previously announced a preliminary, still-unfinalized agreement with Nvidia for up to 10 gigawatts of AI chips. The approach reduces dependency risk, improves cost control, and allows optimization across training and inference separately.
Competitive signal for the inference hardware market
The deal highlights intensifying competition in AI inference, a segment traditionally dominated by GPU vendors. Nvidia itself signed a $20 billion licensing agreement with Groq, another inference-focused chip company, with approximately 90% of Groq’s workforce transitioning to Nvidia. OpenAI’s commitment to Cerebras reinforces the view that inference performance, not just training scale, is becoming a decisive battleground in AI infrastructure.

Why This Matters

Compute availability and latency have become the primary constraints on scaling consumer and enterprise AI services. OpenAI’s commitment of more than $10 billion to wafer-scale, inference-optimized systems signals a structural shift in AI infrastructure strategy, accelerating diversification away from monolithic GPU dependence. If successful, this model could reshape data center architectures, strengthen competition in AI hardware, and make real-time, high-value AI interactions economically viable at global scale.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.