AI chip startup Cerebras outperforms NVIDIA's Blackwell in Llama 4 test

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Cerebras has achieved a groundbreaking milestone in AI inference performance, establishing a new world record for processing speed with Meta’s flagship large language model. By delivering over 2,500 tokens per second on the massive 400B parameter Llama 4 Maverick model, Cerebras has demonstrated that specialized AI hardware can significantly outpace even the most advanced GPU solutions, reshaping performance expectations for enterprise AI deployments.

The big picture: Cerebras has set a world record for LLM inference speed, achieving over 2,500 tokens per second with Meta’s 400B parameter Llama 4 Maverick model.

Independent benchmark firm Artificial Analysis measured Cerebras at 2,522 tokens per second per user, more than doubling the 1,038 tokens per second recently announced by NVIDIA’s flagship Blackwell GPUs.
This performance milestone makes Cerebras the first and only inference solution to break the 2,500 TPS barrier with the largest and most powerful model in the Llama 4 family.

Why this matters: Inference speed is critical for advanced AI applications, particularly for agents, code generation, and complex reasoning tasks where responsiveness determines usability.

By the numbers: Artificial Analysis tested multiple vendors against the same model, revealing a significant performance gap:

Cerebras: 2,522 tokens per second
NVIDIA Blackwell: 1,038 tokens per second
SambaNova: 794 tokens per second
Groq: 549 tokens per second
Amazon: 290 tokens per second
Google: 125 tokens per second
Microsoft Azure: 54 tokens per second

What they’re saying: “Cerebras is the only inference solution that outperforms Blackwell for Meta’s flagship model,” said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis.

Competitive advantage: Cerebras claims its solution is optimized specifically for Llama 4 and is available immediately without requiring special software optimizations.

Cerebras beats NVIDIA Blackwell: Llama 4 Maverick Inference

cerebras

Menu

AI chip startup Cerebras outperforms NVIDIA’s Blackwell in Llama 4 test

Recent News

OpenAI launches Operator AI agent and jobs platform to blend human-AI work

It’s AI or, well, not much as venture capital becomes two-tiered market

Anthropic brings Claude AI directly into Slack for paid teams

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

AI chip startup Cerebras outperforms NVIDIA’s Blackwell in Llama 4 test

Recent News

OpenAI launches Operator AI agent and jobs platform to blend human-AI work

It’s AI or, well, not much as venture capital becomes two-tiered market

Anthropic brings Claude AI directly into Slack for paid teams

Join the revolution

CO/AI

Resources

Join the revolution