AWS and Cerebras Collaboration Sets a New Standard for AI Inference Speed and Performance in the Cloud >

The fastest way to deploy Llama models

Run Llama 3.3, Llama 4 Scout, and Llama 4 Maverick—powered by Cerebras, available now. This partnership brings Meta’s most advanced models to life with unmatched inference speed—unlocking real-time reasoning, voice, and agentic AI at scale.

2,500 tokens/sec

High-performance, cost-efficient, and multilingual – 15x faster than a GPU

2,600 tokens/sec

The smallest and fastest member of the llama 4 family, built for speed and efficiency - 16X faster than GPU

2,500 tokens/sec

the largest and most powerful in the Llama 4 family - 14X faster than GPU

“We’re excited to share the first models in the Llama 4 herd and partner with Cerebras to deliver the world’s fastest AI inference for them, which will enable people to build more personalized multimodal experiences. By delivering over 2,000 tokens per second for Scout – more than 30 times faster than closed models like ChatGPT or Anthropic, Cerebras is helping developers everywhere to move faster, go deeper, and build better than ever before.”

Ahmed Al-Dahle

VP of GenAI at Meta

Schedule a meeting to discuss your AI vision and strategy.

Build with us join newsletter Linkedin

Get Updates

Newsletter signup

Company

News

Insights

Performance comparisons are based on third-party benchmarking or internal testing. Observed inference speed improvements versus GPU-based systems may vary depending on workload, configuration, date and models being tested.

info@cerebras.ai

1237 E. Arques Ave  Sunnyvale, CA 94085