Cloudflare AI — Intelligence at the Edge

Why Cloudflare AI

The Complete AI Stack at the Edge

From inference to observability to vector search — everything you need to build production AI applications, running on Cloudflare's global network.

🧠

Inference Everywhere

Run 50+ open-source models on serverless GPUs across 330+ cities. No infrastructure to manage — just deploy and scale.

🔍

Observe & Control

AI Gateway gives you caching, rate limiting, request retries, model fallback, and real-time analytics for every AI call.

📊

Vector Database

Vectorize enables semantic search, recommendations, and RAG at the edge. Built on Cloudflare's network for low latency.

🔒

Privacy-First

Your data stays on Cloudflare's network. No third-party hops. Built-in Firewall for AI detects prompt injections and unsafe content.

⚡

Serverless Scale

Pay per inference. No idle costs. Auto-scales from zero to global traffic with zero warm-up time.

🔌

Unified Platform

Workers + AI Gateway + Vectorize + R2 + D1 + KV — every piece of the platform integrates natively. No glue code needed.

Products

AI Platform, Built for Developers

Three products that work together to power your AI applications end-to-end.

Workers AI

Serverless GPU Inference

Run machine learning models on Cloudflare's network — no servers, no GPUs to manage. Invoke models from Workers, Pages, or directly via the API.

50+ open-source models including Llama, Mistral, DeepSeek, and more
Text generation, image generation, embeddings, speech, and classification
Pay-per-inference pricing — free tier available
Leonardo AI image generation & Deepgram speech models now available

          // Run inference in 4 lines

          const ai = new Ai(env.AI);

          const result = await ai.run("@cf/meta/llama-3.1-8b-instruct", {

            prompt: "Explain edge computing"

          });

🧠

50+ Models

Llama 3.1 · Mistral · DeepSeek · Gemma ·
Whisper · Stable Diffusion · BGE Embeddings

AI Gateway

Observe, Control, and Optimize

A unified gateway for all your AI API calls. Cache responses, enforce rate limits, retry failures, and monitor usage — across any provider.

Universal endpoint supporting OpenAI, Anthropic, Google, and Workers AI
Smart caching, rate limiting, and automatic retries with fallback
Real-time analytics: tokens, costs, latency, error rates
Dynamic routing — auto-route to best-performing model

🛡️

Unified AI Control Plane

One endpoint · Any provider · Full observability

Vectorize

Vector Database at the Edge

Build AI applications with semantic memory. Vectorize is Cloudflare's native vector database — no external services, no data leaving the network.

Serverless vector database — automatically scales with your data
Semantic search, RAG, recommendations, anomaly detection
Works with BGE embeddings from Workers AI
Integrates with Workers, D1, R2, and AI Gateway

🔢

Vector Database

Semantic search · RAG · Recommendations

What's New

AI Week 2025 & Beyond

Cloudflare AI Week 2025 brought major updates across the platform. Here's what shipped.

🖼️

Leonardo AI Models

State-of-the-art image generation models now available on Workers AI. Generate and edit images at the edge.

🎤

Deepgram Speech

Text-to-speech and speech-to-text models from Deepgram — build voice applications entirely on Cloudflare.

🛡️

Firewall for AI

Detect prompt injections, unsafe content, and shadow AI usage. Protect your AI applications before they reach the model.

🤝

OpenAI Open Models

Cloudflare is a Day 0 launch partner for OpenAI's new open-weight models — available directly on Workers AI.

🔄

Dynamic Routing

AI Gateway now supports dynamic model routing — automatically route requests to the best-performing or cheapest model.

📈

Production Ready

Workers AI is Generally Available with enterprise-grade SLAs, higher rate limits, and dedicated support options.

Use Cases

What Teams Are Building

From startups to enterprises — real applications powered by Cloudflare AI.

💬

AI Customer Support

Deploy a smart chatbot that answers product questions, processes returns, and escalates to humans — all at the edge with sub-100ms response times.

Workers AI + AI Gateway

🔍

Semantic Product Search

Replace keyword search with vector embeddings. Let customers search by meaning, not just keywords. 10x improvement in discovery rates.

Vectorize + BGE Embeddings

📝

Content Moderation

Use Llama Guard on Workers AI to automatically flag unsafe content. Cache moderation results with AI Gateway to reduce costs by 60%.

Workers AI + AI Gateway

🎨

Image Generation API

Build an image generation service using Leonardo AI on Workers AI. Generate, transform, and serve images — all on Cloudflare's network.

Workers AI + R2 + Images

📄

RAG-Powered Documentation

Index your docs with Vectorize, embed queries with Workers AI, and answer with Llama. Production-ready AI docs in under 100 lines of code.

Vectorize + Workers AI

🌍

Multi-Language Translation

Translate content into 50+ languages at the edge. Use AI Gateway to cache frequent translations and monitor translation costs across your org.

Workers AI + AI Gateway

Intelligence at the Edge

The Complete AI Stack at the Edge

Inference Everywhere

Observe & Control

Vector Database

Privacy-First

Serverless Scale

Unified Platform

AI Platform, Built for Developers

Serverless GPU Inference

50+ Models

Observe, Control, and Optimize

Unified AI Control Plane

Vector Database at the Edge

Vector Database

AI Week 2025 & Beyond

Leonardo AI Models

Deepgram Speech

Firewall for AI

OpenAI Open Models

Dynamic Routing

Production Ready

What Teams Are Building

AI Customer Support

Semantic Product Search

Content Moderation

Image Generation API

RAG-Powered Documentation

Multi-Language Translation

Build Your First AI App in Minutes