Cloudflare’s AI Platform: an inference layer designed for agents

AI models are changing quickly: the best model to use for agentic coding today might in three months be a completely different model from a different provider. On top of this, real-world use cases often require calling more than one model. Your customer support agent might use a fast, cheap model to classify a user's message; a large, reasoning model to plan its actions; and a lightweight model to execute individual tasks.

This means you need access to all the models, without tying yourself financially and operationally to a single provider. You also need the right systems in place to monitor costs across providers, ensure reliability when one of them has an outage, and manage latency no matter where your users are.

These challenges are present whenever you’re building with AI, but they get even more pressing when you’re building agents. A simple chatbot might make one inference call per user prompt. An agent might chain ten calls together to complete a single task and suddenly, a single slow provider doesn't add 50ms, it adds 500ms. One failed request isn't a retry, but suddenly a cascade of downstream failures. 

Since launching AI Gateway and Workers AI, we’ve seen incredible Continue reading

Building the foundation for running extra-large language models

An agent needs to be powered by a large language model. A few weeks ago, we announced that Workers AI is officially entering the arena for hosting large open-source models like Moonshot’s Kimi K2.5. Since then, we’ve made Kimi K2.5 3x faster and have more model additions in-flight. These models have been the backbone of a lot of the agentic products, harnesses, and tools that we have been launching this week. 

Hosting AI models is an interesting challenge: it requires a delicate balance between software and very, very expensive hardware. At Cloudflare, we’re good at squeezing every bit of efficiency out of our hardware through clever software engineering. This is a deep dive on how we’re laying the foundation to run extra-large language models.

Hardware configurations

As we mentioned in our previous Kimi K2.5 blog post, we’re using a variety of hardware configurations in order to best serve models. A lot of hardware configurations depend on the size of inputs and outputs that users are sending to the model. For example, if you are using a model to write fanfiction, you might give it a few small prompts (input tokens) while asking it to generate Continue reading

Artifacts: versioned storage that speaks Git

Agents have changed how we think about source control, file systems, and persisting state. Developers and agents are generating more code than ever — more code will be written over the next 5 years than in all of programming history — and it’s driven an order-of-magnitude change in the scale of the systems needed to meet this demand. Source control platforms are especially struggling here: they were built to meet the needs of humans, not a 10x change in volume driven by agents who never sleep, can work on several issues at once, and never tire.

We think there’s a need for a new primitive: a distributed, versioned filesystem that’s built for agents first and foremost, and that can serve the types of applications that are being built today.

We’re calling this Artifacts: a versioned file system that speaks Git. You can create repositories programmatically, alongside your agents, sandboxes, Workers, or any other compute paradigm, and connect to it from any regular Git client.

Want to give every agent session a repo? Artifacts can do it. Every sandbox instance? Also Artifacts. Want to create 10,000 forks from a known-good starting point? You guessed it: Artifacts again. Artifacts exposes a REST Continue reading

AI Search: the search primitive for your agents

Every agent needs search: Coding agents search millions of files across repos, or support agents search customer tickets and internal docs. The use cases are different, but the underlying problem is the same: get the right information to the model at the right time.

If you're building search yourself, you need a vector index, an indexing pipeline that parses and chunks your documents, and something to keep the index up to date when your data changes. If you also need keyword search, that's a separate index and fusion logic on top. And if each of your agents needs its own searchable context, you're setting all of that up per agent. 

AI Search (formerly AutoRAG) is the plug-and-play search primitive you need. You can dynamically create instances, give it your data, and search — from a Worker, the Agents SDK, or Wrangler CLI. Here's what we're shipping:

  • Hybrid search. Enable both semantic and keyword matching in the same query. Vector search and BM25 run in parallel and results are fused. (The search on our blog is now powered by AI Search. Try the magnifying glass icon to the top right.)

  • Built-in storage and index. New instances come with Continue reading

Deploy Postgres and MySQL databases with PlanetScale + Workers

Cloudflare announced our PlanetScale partnership last September to give Cloudflare Workers direct access to Postgres and MySQL databases for fast, full-stack applications.

Soon, we’re bringing our technologies even closer: you’ll be able to create PlanetScale Postgres and MySQL databases directly from the Cloudflare dashboard and API, and have them billed to your Cloudflare account. 

You choose the data storage that fits your Worker application needs and keep a single system for billing as a Cloudflare self-serve or enterprise customer. Cloudflare credits like those given in our startup program or Cloudflare committed spend can be used towards PlanetScale databases.

Postgres & MySQL for Workers

SQL relational databases like Postgres and MySQL are a foundation of modern applications. In particular, Postgres has risen in developer popularity with its rich tooling ecosystem (ORMs, GUIs, etc) and extensions like pgvector for building vector search in AI-driven applications. Postgres is the default choice for most developers who need a powerful, flexible, and scalable database to power their applications.

You can already connect your PlanetScale account and create Postgres databases directly from the Cloudflare dashboard for your Workers. Starting next month, a new Cloudflare subscription will bill for new PlanetScale databases direct to your Cloudflare Continue reading

Cloudflare Email Service: now in public beta. Ready for your agents

Email is the most accessible interface in the world. It is ubiquitous. There’s no need for a custom chat application, no custom SDK for each channel. Everyone already has an email address, which means everyone can already interact with your application or agent. And your agent can interact with anyone.

If you are building an application, you already rely on email for signups, notifications, and invoices. Increasingly, it is not just your application logic that needs this channel. Your agents do, too. During our private beta, we talked to developers who are building exactly this: customer support agents, invoice processing pipelines, account verification flows, multi-agent workflows. All built on top of email. The pattern is clear: email is becoming a core interface for agents, and developers need infrastructure purpose-built for it.

Cloudflare Email Service is that piece. With Email Routing, you can receive email to your application or agent. With Email Sending, you can reply to emails or send outbounds to notify your users when your agents are done doing work. And with the rest of the developer platform, you can build a full email client and Agents SDK onEmail hook as native functionality. 

Today, as part of Continue reading

Testing FRRouting Pull Requests with netlab

Every other blue moon, I discover a bug in FRRouting (example). Because the FRRouting developers care about their work, it usually gets fixed within a few days, often resulting in a “can you test this PR?” question.

It turns out that’s surprisingly easy to do with netlab – here’s the step-by-step procedure (assuming you already have the topology file that reproduced the bug):

Beyond the Prompt: AI Agent Design Patterns and the New Governance Gap

If you are treating Large Language Models (LLMs) like simple question-and-answer machines, you are leaving their most transformative potential on the table. The industry has officially shifted from zero-shot prompting to structured AI agent design patterns and agentic workflows where AI iteratively reasons, uses external tools, and collaborates to solve complex engineering problems. These design patterns are the architectural blueprints that determine how autonomous Agentic AI systems work and interact with your infrastructure.

But as these systems proliferate faster than organizations can govern them, they introduce a critical AI agent security risk: By the end of 2026, 40% of enterprise applications will feature embedded AI agents, and those teams will urgently need purpose-built strategies to govern this new autonomous workforce before it becomes the next major shadow IT crisis.

Before you can secure these autonomous systems, you have to understand how they are built. Here is a technical breakdown of the current AI Agent design patterns you need to know, and the specific security blind spots each design pattern creates.

1. The Foundational Execution Patterns

Building reliable AI systems comes down to how you route the cognitive load. Here are the three baseline structural patterns:

A. The Single Agent Continue reading

D2DO300: Open Source Malware!

Malware has shifted from phishing expeditions to open source packages, domains, and repositories. Ned and Kyler welcome Jenn Gile, co-founder of Open Source Malware, to discuss how malware is making its way into open source software. Together they break down NPM compromises, AI-driven infiltration, malicious agent skills, and more. Episode Links: Open Source Malware –... Read more »

Project Think: building the next generation of AI agents on Cloudflare

Today, we're introducing Project Think: the next generation of the Agents SDK. Project Think is a set of new primitives for building long-running agents (durable execution, sub-agents, sandboxed code execution, persistent sessions) and an opinionated base class that wires them all together. Use the primitives to build exactly what you need, or use the base class to get started fast.

Something happened earlier this year that changed how we think about AI. Tools like Pi, OpenClaw, Claude Code, and Codex proved a simple but powerful idea: give an LLM the ability to read files, write code, execute it, and remember what it learned, and you get something that looks less like a developer tool and more like a general-purpose assistant.

These coding agents aren't just writing code anymore. People are using them to manage calendars, analyze datasets, negotiate purchases, file taxes, and automate entire business workflows. The pattern is always the same: the agent reads context, reasons about it, writes code to take action, observes the result, and iterates. Code is the universal medium of action.

Our team has been using these coding agents every day. And we kept running into the same walls:

Introducing Agent Lee – a new interface to the Cloudflare stack

While there have been small improvements along the way, the interface of technical products has not really changed since the dawn of the Internet. It still remains: clicking five pages deep, cross-referencing logs across tabs, and hunting for hidden toggles.

AI gives us the opportunity to rethink all that. Instead of complexity spread over a sprawling graphical user interface: what if you could describe in plain language what you wanted to achieve? 

This is the future — and we’re launching it today. We didn’t want to just put an agent in a dashboard. We wanted to create an entirely new way to interact with our entire platform. Any task, any surface, a single prompt.

Introducing Agent Lee.

Agent Lee is an in-dashboard AI assistant that understands your Cloudflare account. 

It can help you with troubleshooting, which, today, is a manual grind. If your Worker starts returning 503s at 02:00 UTC, finding the root cause: be it an R2 bucket, a misconfigured route, or a hidden rate limit, you’re opening half a dozen tabs and hoping you recognize the pattern. Most developers don't have a teammate who knows the entire platform standing over their shoulder at 2 a.m. Agent Continue reading

BGP Labs: Graceful Degradation for Unsupported Devices

A few weeks ago, I described the changes in the online BGP labs that allow you to use most of the common network operating systems as “external” routers1. However, while we keep improving it, netlab still can’t configure all BGP features on all supported devices (PRs from Nokia and Mikrotik fans would be highly appreciated 😎), which means that it’s possible to configure your environment in a way where some of the more complex labs would simply fail to start.

The limited choice of devices for external routers was always well-documented (example), but if you insisted on using unsupported devices, the lab would fail to start with an error message, and you’d have to tweak the lab topology (example). Wouldn’t it be better to start the lab with a warning?

PP105: Cybercrime Has Gone Industrial: Insights from HPE Threat Labs (Sponsored)

Threat actors are behaving more like professional organizations in an effort to launch more effective and profitable attacks. We explore this and other themes from the latest Threat Labs report from HPE, our sponsor for today’s Packet Protector episode. We also look at how older vulnerabilities are still contributing to today’s exploits, why security organizations... Read more »

HW075: Speedtest Certified

Speedtest Certified is a network connectivity verification program for properties and venues, allowing them to prove the performance of their Wi-Fi. Alan Blake of Ookla joins the show to break down what the certification actually measures, how assessments are performed, and what it means for network owners as well as Wi Fi professionals. This is... Read more »

Beyond the VPN: Cloudflare Mesh builds a private network for the age of AI agents

Cloud connectivity has long been a manual, fragmented headache for DevOps teams. On Tuesday, Cloudflare moved to bridge that gap with the launch of Cloudflare Mesh, a private networking service designed to unify multi-cloud environments into a single secure fabric for humans, agents, and code alike. Hoping to provide a new fusion point for cloud connectivity among humans, agents, and code, Cloudflare aims to do so. Cloudflare, which provides services for roughly 20% of the web, announced on Tuesday its eponymous Cloudflare Mesh, a private networking service that aims to align multi-cloud infrastructures into a single secure fabric. Private networking: a definition To understand Mesh, one must first define Cloudflare’s specific flavor of “private networking.” Unlike a traditional private cloud, this model connects internal resources, including servers, databases, and development tool environments, to the wider world of the web, without opening ports on a company’s firewall. “As autonomous agents become more common, businesses must rethink access models or risk insecure workarounds for the ‘new class of client’ that needs secure access to internal resources.” — Christian Reilly, Cloudflare. Essentially, Cloudflare Mesh helps software developers and operations teams to encrypt every connection point, without ever exposing internal infrastructure and data to Continue reading

Four public live production flow analytics dashboards

The following publicly accessible dashboards show live data from operational networks, including: an AI/ML RoCEv2 fabric, a world-wide Kubernetes cluster, and an Internet Exchange Provider (IXP). Click on the [ LIVE DASHBOARD ] link under each screen capture to access the live dashboard.

San Diego Supercomputer Center Expanse Cluster AI/ML dashboard using ai-metrics application. See AI Metrics with Prometheus and Grafana for detailed, step-by-step, instructions for setting up monitoring and dashboard.

San Diego Supercomputer Center Expanse Cluster AI/ML traffic matrix using heatmap application. See Real-time visualization of AI / ML traffic matrix for an explanation of the chart with examples.

National Research Platform Nautilus Cluster GPU, CPU, and network resources in world-wide Kubernetes cluster using sunburst application. See Real-time Kubernetes cluster monitoring example for more details and step-by-step instructions for deploying monitoring.

San Francisco Metropolitan Internet Exchange overall traffic dashboard using ixp-metrics application. See Internet eXchange Provider (IXP) Metrics for detailed, step-by-step, instructions for setting up overall exchange traffic and per member peering traffic dashboards.

Live Dashboards maintains a current list publicly accessible dashboards. If you have dashboard to share, would like help learning Continue reading

NB570: Project Glasswing’s FUD and Thunder; Au Revoir Windows, Bonjour Linux

Take a Network Break! We commence with a red alert on FastMCP, and then debate whether Anthropic’s Project Glasswing is a marketing stunt or a reasonable response to the growing ability of AI models to find and exploit software vulnerabilities. Iran targets US OT networks, startup Aria Networks unveils Ethernet switches purpose-built for AI factories,... Read more »
1 8 9 10 11 12 3,872