The fastest way to deploy Llama models
Run Llama 3.3, Llama 4 Scout, and Llama 4 Maverick—powered by Cerebras, available now. This partnership brings Meta’s most advanced models to life with unmatched inference speed—unlocking real-time reasoning, voice, and agentic AI at scale.


“We’re excited to share the first models in the Llama 4 herd and partner with Cerebras to deliver the world’s fastest AI inference for them, which will enable people to build more personalized multimodal experiences. By delivering over 2,000 tokens per second for Scout – more than 30 times faster than closed models like ChatGPT or Anthropic, Cerebras is helping developers everywhere to move faster, go deeper, and build better than ever before.”


