๐Ÿš€ AI Week 2025 โ€” Now GA

Intelligence at the Edge

Run serverless AI models on Cloudflare's global network. 50+ models, built-in observability, and a vector database โ€” all in one platform.

50+
AI Models
330+
Cities
<10ms
Latency
๐ŸŒ
Run AI Everywhere
Serverless GPUs on Cloudflare's Network
Scroll

The Complete AI Stack at the Edge

From inference to observability to vector search โ€” everything you need to build production AI applications, running on Cloudflare's global network.

๐Ÿง 

Inference Everywhere

Run 50+ open-source models on serverless GPUs across 330+ cities. No infrastructure to manage โ€” just deploy and scale.

๐Ÿ”

Observe & Control

AI Gateway gives you caching, rate limiting, request retries, model fallback, and real-time analytics for every AI call.

๐Ÿ“Š

Vector Database

Vectorize enables semantic search, recommendations, and RAG at the edge. Built on Cloudflare's network for low latency.

๐Ÿ”’

Privacy-First

Your data stays on Cloudflare's network. No third-party hops. Built-in Firewall for AI detects prompt injections and unsafe content.

โšก

Serverless Scale

Pay per inference. No idle costs. Auto-scales from zero to global traffic with zero warm-up time.

๐Ÿ”Œ

Unified Platform

Workers + AI Gateway + Vectorize + R2 + D1 + KV โ€” every piece of the platform integrates natively. No glue code needed.

AI Platform, Built for Developers

Three products that work together to power your AI applications end-to-end.

Workers AI

Serverless GPU Inference

Run machine learning models on Cloudflare's network โ€” no servers, no GPUs to manage. Invoke models from Workers, Pages, or directly via the API.

  • 50+ open-source models including Llama, Mistral, DeepSeek, and more
  • Text generation, image generation, embeddings, speech, and classification
  • Pay-per-inference pricing โ€” free tier available
  • Leonardo AI image generation & Deepgram speech models now available
// Run inference in 4 lines
const ai = new Ai(env.AI);
const result = await ai.run("@cf/meta/llama-3.1-8b-instruct", {
  prompt: "Explain edge computing"
});
๐Ÿง 

50+ Models

Llama 3.1 ยท Mistral ยท DeepSeek ยท Gemma ยท
Whisper ยท Stable Diffusion ยท BGE Embeddings

AI Gateway

Observe, Control, and Optimize

A unified gateway for all your AI API calls. Cache responses, enforce rate limits, retry failures, and monitor usage โ€” across any provider.

  • Universal endpoint supporting OpenAI, Anthropic, Google, and Workers AI
  • Smart caching, rate limiting, and automatic retries with fallback
  • Real-time analytics: tokens, costs, latency, error rates
  • Dynamic routing โ€” auto-route to best-performing model
๐Ÿ›ก๏ธ

Unified AI Control Plane

One endpoint ยท Any provider ยท Full observability

Vectorize

Vector Database at the Edge

Build AI applications with semantic memory. Vectorize is Cloudflare's native vector database โ€” no external services, no data leaving the network.

  • Serverless vector database โ€” automatically scales with your data
  • Semantic search, RAG, recommendations, anomaly detection
  • Works with BGE embeddings from Workers AI
  • Integrates with Workers, D1, R2, and AI Gateway
๐Ÿ”ข

Vector Database

Semantic search ยท RAG ยท Recommendations

AI Week 2025 & Beyond

Cloudflare AI Week 2025 brought major updates across the platform. Here's what shipped.

๐Ÿ–ผ๏ธ

Leonardo AI Models

State-of-the-art image generation models now available on Workers AI. Generate and edit images at the edge.

๐ŸŽค

Deepgram Speech

Text-to-speech and speech-to-text models from Deepgram โ€” build voice applications entirely on Cloudflare.

๐Ÿ›ก๏ธ

Firewall for AI

Detect prompt injections, unsafe content, and shadow AI usage. Protect your AI applications before they reach the model.

๐Ÿค

OpenAI Open Models

Cloudflare is a Day 0 launch partner for OpenAI's new open-weight models โ€” available directly on Workers AI.

๐Ÿ”„

Dynamic Routing

AI Gateway now supports dynamic model routing โ€” automatically route requests to the best-performing or cheapest model.

๐Ÿ“ˆ

Production Ready

Workers AI is Generally Available with enterprise-grade SLAs, higher rate limits, and dedicated support options.

What Teams Are Building

From startups to enterprises โ€” real applications powered by Cloudflare AI.

๐Ÿ’ฌ

AI Customer Support

Deploy a smart chatbot that answers product questions, processes returns, and escalates to humans โ€” all at the edge with sub-100ms response times.

Workers AI + AI Gateway
๐Ÿ”

Semantic Product Search

Replace keyword search with vector embeddings. Let customers search by meaning, not just keywords. 10x improvement in discovery rates.

Vectorize + BGE Embeddings
๐Ÿ“

Content Moderation

Use Llama Guard on Workers AI to automatically flag unsafe content. Cache moderation results with AI Gateway to reduce costs by 60%.

Workers AI + AI Gateway
๐ŸŽจ

Image Generation API

Build an image generation service using Leonardo AI on Workers AI. Generate, transform, and serve images โ€” all on Cloudflare's network.

Workers AI + R2 + Images
๐Ÿ“„

RAG-Powered Documentation

Index your docs with Vectorize, embed queries with Workers AI, and answer with Llama. Production-ready AI docs in under 100 lines of code.

Vectorize + Workers AI
๐ŸŒ

Multi-Language Translation

Translate content into 50+ languages at the edge. Use AI Gateway to cache frequent translations and monitor translation costs across your org.

Workers AI + AI Gateway

Build Your First AI App in Minutes

No servers. No GPUs to manage. Just your code and Cloudflare's global network. Start with the free tier โ€” no credit card required.