Redirects for AI Training enforces canonical content

Cloudflare's Wrangler CLI has published several major versions over the past six years, each containing at least some critical changes to commands, configuration, or how developers interact with the platform. Like any actively maintained open-source project, we keep documentation for older versions available. The v1 documentation carries a deprecation banner, a noindex meta tag, and canonical tags pointing to current docs. Every advisory signal says the same thing: this content is outdated, look elsewhere. AI training crawlers don’t reliably honor those signals. 

We use AI Crawl Control on developers.cloudflare.com, so we know that bots in the AI Crawler Category visited 4.8 million times over the last 30 days, and they consumed deprecated content at the same rate as current content. The advisory signals made no measurable difference. The effect is cumulative because AI agents don't always fetch content live; they draw on trained models. When crawlers ingest deprecated docs, agents inherit outdated foundations.

Today, we’re launching Redirects for AI Training to let you enforce that verified AI training crawlers are redirected to up-to-date content. Your existing canonical tags become HTTP 301 redirects for verified AI training crawlers, automatically, with one toggle, on all paid Continue reading

Unweight: how we compressed an LLM 22% without sacrificing quality

Running inference within 50ms of 95% of the world's Internet-connected population means being ruthlessly efficient with GPU memory. Last year we improved memory utilization with Infire, our Rust-based inference engine, and eliminated cold-starts with Omni, our model scheduling platform. Now we are tackling the next big bottleneck in our inference platform: model weights.

Generating a single token from an LLM requires reading every model weight from GPU memory. On the NVIDIA H100 GPUs we use in many of our datacenters, the tensor cores can process data nearly 600 times faster than memory can deliver it, leading to a bottleneck not in compute, but memory bandwidth. Every byte that crosses the memory bus is a byte that could have been avoided if the weights were smaller.

To solve this problem, we built Unweight: a lossless compression system that can make model weights up to 15–22% smaller while preserving bit-exact outputs, without relying on any special hardware. The core breakthrough here is that decompressing weights in fast on-chip memory and feeding them directly to the tensor cores avoids an extra round-trip through slow main memory. Depending on the workload, Unweight’s runtime selects from multiple execution strategies – some prioritize Continue reading

Agents that remember: introducing Agent Memory

As developers build increasingly sophisticated agents on Cloudflare, one of the biggest challenges they face is getting the right information into context at the right time. The quality of results produced by models is directly tied to the quality of context they operate with, but even as context window sizes grow past one million (1M) tokens, context rot remains an unsolved problem. A natural tension emerges between two bad options: keep everything in context and watch quality degrade, or aggressively prune and risk losing information the agent needs later.

Today we're announcing the private beta of Agent Memory, a managed service that extracts information from agent conversations and makes it available when it’s needed, without filling up the context window.

It gives AI agents persistent memory, allowing them to recall what matters, forget what doesn't, and get smarter over time. In this post, we’ll explain how it works — and what it can help you build.

The state of agentic memory

Agentic memory is one of the fastest-moving spaces in AI infrastructure, with new open-source libraries, managed services, and research prototypes launching on a near-weekly basis. These offerings vary widely in what they store, how they retrieve, and what Continue reading

Technology Short Take 194

This is Technology Short Take #194, the latest in my semi-regular series of posts sharing data center technology-related links and articles from around the web. What have I gathered for readers this time? Key topics in Tech Short Take 194 include a new cryptography-based network stack, a look at why folks are revolting against AI, and an in-depth comparison of microVM technologies. On to the content!

Networking

Security

Your AI Agents Are Autonomous. But Are They Accountable?

Why accountability, not capability, is the real bottleneck for enterprise agentic AI, and what security leaders need to do about it before regulators force the issue.

Every enterprise is building AI agents. Marketing has one summarizing campaign performance. Engineering has one triaging incidents. Customer support has one resolving tickets. Finance has one processing invoices. And increasingly, those agents are talking to each other: calling tools, accessing databases, delegating tasks across complex multi-hop chains.

But here’s the question nobody wants to hear at 3 a.m. when something goes wrong: who authorized that action, what policy permitted it, and what’s the full chain of events?

For most enterprises, the honest answer is: nobody knows. That’s not a governance problem — it’s an AI agent accountability crisis.

Agents Are Scaling Faster Than Governance

The data paints a stark picture. McKinsey research found that 80% of organizations have already encountered risky behavior from AI agents. These actions were unintended, unauthorized, or outside acceptable guardrails. Yet only about one-third of organizations report meaningful governance maturity. Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027 due to escalating costs, unclear business value, or inadequate risk controls.

This Continue reading

Lab: Integrated Routing and Bridging (IRB) with EVPN MAC-VRF Instances

Most EVPN documentation rushes into the complexities of symmetric IRB, type-5 EVPN routes, and EVPN IP-VRFs, but if you want to master a technology, it’s better to take it slow and proceed with small steps. In today’s lab exercise, we’ll do just that: explore what happens when you add IP addresses (we’ll use anycast gateways) to VLAN interfaces tied to EVPN MAC-VRF instances.

Deployed Is Not the Same as Ready: How Mature Is Your Kubernetes Environment?

Kubernetes adoption is no longer the challenge it once was. More than 82% of enterprises run containers in production, most of them on multiple Kubernetes clusters. Adoption, however, does not mean operational maturity. These are two very different things. It is one thing to deploy workloads to a cluster or two and quite another to do it securely, efficiently and at scale.

This distinction matters because the gap between adoption and Kubernetes operational maturity is where risk accumulates. Operationally mature organizations ship faster, recover from incidents in minutes instead of hours and consistently pass compliance audits. They spend less time dealing with outages and more time delivering new services to their customers.

So what separates maturity from adoption? It comes down to a handful of foundational capabilities that, when done well, result in measurable business impact. Operational maturity — the ability to run Kubernetes workloads securely, efficiently, and at scale, with consistent policy enforcement, cross-cluster observability, and automated incident recovery — is not a destination; it is a continuous process of strengthening the architectural pillars that keep your Kubernetes environment production-ready.

What does operational maturity look like?

Operational maturity spans several interconnected areas from Kubernetes security best practices to observability Continue reading

Cloudflare’s AI Platform: an inference layer designed for agents

AI models are changing quickly: the best model to use for agentic coding today might in three months be a completely different model from a different provider. On top of this, real-world use cases often require calling more than one model. Your customer support agent might use a fast, cheap model to classify a user's message; a large, reasoning model to plan its actions; and a lightweight model to execute individual tasks.

This means you need access to all the models, without tying yourself financially and operationally to a single provider. You also need the right systems in place to monitor costs across providers, ensure reliability when one of them has an outage, and manage latency no matter where your users are.

These challenges are present whenever you’re building with AI, but they get even more pressing when you’re building agents. A simple chatbot might make one inference call per user prompt. An agent might chain ten calls together to complete a single task and suddenly, a single slow provider doesn't add 50ms, it adds 500ms. One failed request isn't a retry, but suddenly a cascade of downstream failures. 

Since launching AI Gateway and Workers AI, we’ve seen incredible Continue reading

Building the foundation for running extra-large language models

An agent needs to be powered by a large language model. A few weeks ago, we announced that Workers AI is officially entering the arena for hosting large open-source models like Moonshot’s Kimi K2.5. Since then, we’ve made Kimi K2.5 3x faster and have more model additions in-flight. These models have been the backbone of a lot of the agentic products, harnesses, and tools that we have been launching this week. 

Hosting AI models is an interesting challenge: it requires a delicate balance between software and very, very expensive hardware. At Cloudflare, we’re good at squeezing every bit of efficiency out of our hardware through clever software engineering. This is a deep dive on how we’re laying the foundation to run extra-large language models.

Hardware configurations

As we mentioned in our previous Kimi K2.5 blog post, we’re using a variety of hardware configurations in order to best serve models. A lot of hardware configurations depend on the size of inputs and outputs that users are sending to the model. For example, if you are using a model to write fanfiction, you might give it a few small prompts (input tokens) while asking it to generate Continue reading

Artifacts: versioned storage that speaks Git

Agents have changed how we think about source control, file systems, and persisting state. Developers and agents are generating more code than ever — more code will be written over the next 5 years than in all of programming history — and it’s driven an order-of-magnitude change in the scale of the systems needed to meet this demand. Source control platforms are especially struggling here: they were built to meet the needs of humans, not a 10x change in volume driven by agents who never sleep, can work on several issues at once, and never tire.

We think there’s a need for a new primitive: a distributed, versioned filesystem that’s built for agents first and foremost, and that can serve the types of applications that are being built today.

We’re calling this Artifacts: a versioned file system that speaks Git. You can create repositories programmatically, alongside your agents, sandboxes, Workers, or any other compute paradigm, and connect to it from any regular Git client.

Want to give every agent session a repo? Artifacts can do it. Every sandbox instance? Also Artifacts. Want to create 10,000 forks from a known-good starting point? You guessed it: Artifacts again. Artifacts exposes a REST Continue reading

AI Search: the search primitive for your agents

Every agent needs search: Coding agents search millions of files across repos, or support agents search customer tickets and internal docs. The use cases are different, but the underlying problem is the same: get the right information to the model at the right time.

If you're building search yourself, you need a vector index, an indexing pipeline that parses and chunks your documents, and something to keep the index up to date when your data changes. If you also need keyword search, that's a separate index and fusion logic on top. And if each of your agents needs its own searchable context, you're setting all of that up per agent. 

AI Search (formerly AutoRAG) is the plug-and-play search primitive you need. You can dynamically create instances, give it your data, and search — from a Worker, the Agents SDK, or Wrangler CLI. Here's what we're shipping:

  • Hybrid search. Enable both semantic and keyword matching in the same query. Vector search and BM25 run in parallel and results are fused. (The search on our blog is now powered by AI Search. Try the magnifying glass icon to the top right.)

  • Built-in storage and index. New instances come with Continue reading

Deploy Postgres and MySQL databases with PlanetScale + Workers

Cloudflare announced our PlanetScale partnership last September to give Cloudflare Workers direct access to Postgres and MySQL databases for fast, full-stack applications.

Soon, we’re bringing our technologies even closer: you’ll be able to create PlanetScale Postgres and MySQL databases directly from the Cloudflare dashboard and API, and have them billed to your Cloudflare account. 

You choose the data storage that fits your Worker application needs and keep a single system for billing as a Cloudflare self-serve or enterprise customer. Cloudflare credits like those given in our startup program or Cloudflare committed spend can be used towards PlanetScale databases.

Postgres & MySQL for Workers

SQL relational databases like Postgres and MySQL are a foundation of modern applications. In particular, Postgres has risen in developer popularity with its rich tooling ecosystem (ORMs, GUIs, etc) and extensions like pgvector for building vector search in AI-driven applications. Postgres is the default choice for most developers who need a powerful, flexible, and scalable database to power their applications.

You can already connect your PlanetScale account and create Postgres databases directly from the Cloudflare dashboard for your Workers. Starting next month, a new Cloudflare subscription will bill for new PlanetScale databases direct to your Cloudflare Continue reading

Cloudflare Email Service: now in public beta. Ready for your agents

Email is the most accessible interface in the world. It is ubiquitous. There’s no need for a custom chat application, no custom SDK for each channel. Everyone already has an email address, which means everyone can already interact with your application or agent. And your agent can interact with anyone.

If you are building an application, you already rely on email for signups, notifications, and invoices. Increasingly, it is not just your application logic that needs this channel. Your agents do, too. During our private beta, we talked to developers who are building exactly this: customer support agents, invoice processing pipelines, account verification flows, multi-agent workflows. All built on top of email. The pattern is clear: email is becoming a core interface for agents, and developers need infrastructure purpose-built for it.

Cloudflare Email Service is that piece. With Email Routing, you can receive email to your application or agent. With Email Sending, you can reply to emails or send outbounds to notify your users when your agents are done doing work. And with the rest of the developer platform, you can build a full email client and Agents SDK onEmail hook as native functionality. 

Today, as part of Continue reading

Testing FRRouting Pull Requests with netlab

Every other blue moon, I discover a bug in FRRouting (example). Because the FRRouting developers care about their work, it usually gets fixed within a few days, often resulting in a “can you test this PR?” question.

It turns out that’s surprisingly easy to do with netlab – here’s the step-by-step procedure (assuming you already have the topology file that reproduced the bug):

Beyond the Prompt: AI Agent Design Patterns and the New Governance Gap

If you are treating Large Language Models (LLMs) like simple question-and-answer machines, you are leaving their most transformative potential on the table. The industry has officially shifted from zero-shot prompting to structured AI agent design patterns and agentic workflows where AI iteratively reasons, uses external tools, and collaborates to solve complex engineering problems. These design patterns are the architectural blueprints that determine how autonomous Agentic AI systems work and interact with your infrastructure.

But as these systems proliferate faster than organizations can govern them, they introduce a critical AI agent security risk: By the end of 2026, 40% of enterprise applications will feature embedded AI agents, and those teams will urgently need purpose-built strategies to govern this new autonomous workforce before it becomes the next major shadow IT crisis.

Before you can secure these autonomous systems, you have to understand how they are built. Here is a technical breakdown of the current AI Agent design patterns you need to know, and the specific security blind spots each design pattern creates.

1. The Foundational Execution Patterns

Building reliable AI systems comes down to how you route the cognitive load. Here are the three baseline structural patterns:

A. The Single Agent Continue reading

D2DO300: Open Source Malware!

Malware has shifted from phishing expeditions to open source packages, domains, and repositories. Ned and Kyler welcome Jenn Gile, co-founder of Open Source Malware, to discuss how malware is making its way into open source software. Together they break down NPM compromises, AI-driven infiltration, malicious agent skills, and more. Episode Links: Open Source Malware –... Read more »

Project Think: building the next generation of AI agents on Cloudflare

Today, we're introducing Project Think: the next generation of the Agents SDK. Project Think is a set of new primitives for building long-running agents (durable execution, sub-agents, sandboxed code execution, persistent sessions) and an opinionated base class that wires them all together. Use the primitives to build exactly what you need, or use the base class to get started fast.

Something happened earlier this year that changed how we think about AI. Tools like Pi, OpenClaw, Claude Code, and Codex proved a simple but powerful idea: give an LLM the ability to read files, write code, execute it, and remember what it learned, and you get something that looks less like a developer tool and more like a general-purpose assistant.

These coding agents aren't just writing code anymore. People are using them to manage calendars, analyze datasets, negotiate purchases, file taxes, and automate entire business workflows. The pattern is always the same: the agent reads context, reasons about it, writes code to take action, observes the result, and iterates. Code is the universal medium of action.

Our team has been using these coding agents every day. And we kept running into the same walls: