Orchestrating AI Code Review at scale

Code review is a fantastic mechanism for catching bugs and sharing knowledge, but it is also one of the most reliable ways to bottleneck an engineering team. A merge request sits in a queue, a reviewer eventually context-switches to read the diff, they leave a handful of nitpicks about variable naming, the author responds, and the cycle repeats. Across our internal projects, the median wait time for a first review was often measured in hours.

When we first started experimenting with AI code review, we took the path that most other people probably take: we tried out a few different AI code review tools and found that a lot of these tools worked pretty well, and a lot of them even offered a good amount of customisation and configurability! Unfortunately, though, the one recurring theme that kept coming up was that they just didn’t offer enough flexibility and customisation for an organisation the size of Cloudflare.

So, we jumped to the next most obvious path, which was to grab a git diff, shove it into a half-baked prompt, and ask a large language model to find bugs. The results were exactly as noisy as you might expect, with a flood Continue reading

Multicast PIM Bootstrap Router (BSR) (VI)

Multicast PIM Bootstrap Router (BSR) (VI)

We are at the tail end of our multicast series. So far, we have covered multicast basics, IGMP, PIM Dense Mode, PIM Sparse Mode, and Auto-RP. In this post, we will look at Bootstrap Router, or BSR.

Multicast PIM Auto RP
AutoRP solves this problem by allowing routers to dynamically learn the RP address. Instead of manually configuring the RP on each router
Multicast PIM Bootstrap Router (BSR) (VI)

Bootstrap Router (BSR) Overview

In the previous post, we looked at Auto-RP, which is Cisco's proprietary method for dynamically distributing RP information to all routers in the network. BSR solves the same problem but is defined in the PIM standards (RFC 5059), making it vendor-neutral and interoperable across different platforms. The core idea is the same. Instead of manually configuring the RP address on every router, BSR allows routers to learn RP information automatically. A router called the Bootstrap Router collects RP information from Candidate RPs and distributes it to all PIM routers in the network.

The key difference is in how this information flows. With Auto-RP, the Mapping Agent sends RP mappings to the specific multicast group 224.0.1.40. This creates the chicken-and-egg problem we discussed in the previous post Continue reading

Hedge 302: Communications in Biological Systems

What does biology have to do with computer networks? Much more than you might think. Communications systems, after all, need to solve the same problems–and they often use the same kinds of tools. In this episode of the Hedge, Emily Reeves and Joe Deweese join Russ and Tom to talk about a recent paper comparing computer communications to biological communications.
 

 
download

Introducing the Agent Readiness score. Is your site agent-ready?

The web has always had to adapt to new standards. It learned to speak to web browsers, and then it learned to speak to search engines. Now, it needs to speak to AI agents.

Today, we are excited to introduce isitagentready.com — a new tool to help site owners understand how they can make their sites optimized for agents, from guiding agents on how to authenticate, to controlling what content agents can see, the format they receive it in, and how they pay for it. We are also introducing a new dataset to Cloudflare Radar that tracks the overall adoption of each agent standard across the Internet.

We want to lead by example. That is why we are also sharing how we recently overhauled Cloudflare's Developer Documentation to make it the most agent-friendly documentation site, allowing AI tools to answer questions faster and significantly cheaper.

How agent-ready is the web today?

The short answer: not very. This is expected, but also shows how much more effective agents can be than they are today, if standards are adopted.

To analyze this, Cloudflare Radar took the 200,000 most visited domains on the Internet; filtered out categories where agent readiness isn't important Continue reading

Shared Dictionaries: compression that keeps up with the agentic web

Web pages have grown 6-9% heavier every year for the past decade, spurred by the web becoming more framework-driven, interactive, and media-rich. Nothing about that trajectory is changing. What is changing is how often those pages get rebuilt and how many clients request them. Both are skyrocketing because of agents. 

Shared dictionaries shrink asset transfers from servers to browsers so pages load faster with less bloat on the wire, especially for returning users or visitors on a slow connection. Instead of re-downloading entire JavaScript bundles after every deploy, the browser tells the server what it already has cached, and the server only sends the file diffs. 

Today, we’re excited to give you a sneak peek of our support for shared compression dictionaries, show you what we’ve seen in early testing, and reveal when you’ll be able to try the beta yourself (hint: it’s April 30, 2026!). 

The problem: more shipping = less caching

Agentic crawlers, browsers, and other tools hit endpoints repeatedly, fetching full pages, often to extract a fragment of information. Agentic actors represented just under 10% of total requests across Cloudflare's network during March 2026, up ~60% year-over-year. 

Every page shipped is heavier Continue reading

Agents Week: network performance update

When it comes to the Internet, performance is everything. Every millisecond shaved off a connection is a better experience for the real people using the applications and websites you build. That's why, at Cloudflare, we measure our performance constantly and share updates on a regular basis. 

In our last performance post, published during Birthday Week 2025, we shared that Cloudflare was the fastest network in 40% of the largest 1,000 networks in the world. At the time, we noted a nuanced reading of that figure; we were competitive in many more networks, and the gaps were often notably small. But even so, we were not satisfied with 40%. By December 2025 (our most recent available analysis), we had become the fastest provider in 60% of the top networks. Here's how we got there, and what it means.

How do we measure and compare network performance?

Before diving into the results, let’s review how we collect the data. We start with the 1,000 largest networks in the world by estimated population, using APNIC's data as our source. These networks represent real users in nearly every geography, giving us a broad and meaningful picture of how Internet users experience the Continue reading

Introducing Flagship: feature flags built for the age of AI

AI is writing more code than ever. AI-assisted contributions now account for a rapidly growing share of new code across the platform. Agentic coding tools like OpenCode and Claude Code are shipping entire features in minutes.

AI-generated code entering production is only going to accelerate. But the bigger shift isn't just speed — it's autonomy.

Today, an AI agent writes code and a human reviews, merges, and deploys it. Tomorrow, the agent does all of that itself. The question becomes: how do you let an agent ship to production without removing every safety net?

Feature flags are the answer. An agent writes a new code path behind a flag and deploys it — the flag is off, so nothing changes for users. The agent then enables the flag for itself or a small test cohort, exercises the feature in production, and observes the results. If metrics look good, it ramps the rollout. If something breaks, it disables the flag. The human doesn't need to be in the loop for every step — they set the boundaries, and the flag controls the blast radius.

This is the workflow feature flags were always building toward: not just decoupling deployment from release, but Continue reading

Redirects for AI Training enforces canonical content

Cloudflare's Wrangler CLI has published several major versions over the past six years, each containing at least some critical changes to commands, configuration, or how developers interact with the platform. Like any actively maintained open-source project, we keep documentation for older versions available. The v1 documentation carries a deprecation banner, a noindex meta tag, and canonical tags pointing to current docs. Every advisory signal says the same thing: this content is outdated, look elsewhere. AI training crawlers don’t reliably honor those signals. 

We use AI Crawl Control on developers.cloudflare.com, so we know that bots in the AI Crawler Category visited 4.8 million times over the last 30 days, and they consumed deprecated content at the same rate as current content. The advisory signals made no measurable difference. The effect is cumulative because AI agents don't always fetch content live; they draw on trained models. When crawlers ingest deprecated docs, agents inherit outdated foundations.

Today, we’re launching Redirects for AI Training to let you enforce that verified AI training crawlers are redirected to up-to-date content. Your existing canonical tags become HTTP 301 redirects for verified AI training crawlers, automatically, with one toggle, on all paid Continue reading

Unweight: how we compressed an LLM 22% without sacrificing quality

Running inference within 50ms of 95% of the world's Internet-connected population means being ruthlessly efficient with GPU memory. Last year we improved memory utilization with Infire, our Rust-based inference engine, and eliminated cold-starts with Omni, our model scheduling platform. Now we are tackling the next big bottleneck in our inference platform: model weights.

Generating a single token from an LLM requires reading every model weight from GPU memory. On the NVIDIA H100 GPUs we use in many of our datacenters, the tensor cores can process data nearly 600 times faster than memory can deliver it, leading to a bottleneck not in compute, but memory bandwidth. Every byte that crosses the memory bus is a byte that could have been avoided if the weights were smaller.

To solve this problem, we built Unweight: a lossless compression system that can make model weights up to 15–22% smaller while preserving bit-exact outputs, without relying on any special hardware. The core breakthrough here is that decompressing weights in fast on-chip memory and feeding them directly to the tensor cores avoids an extra round-trip through slow main memory. Depending on the workload, Unweight’s runtime selects from multiple execution strategies – some prioritize Continue reading

Agents that remember: introducing Agent Memory

As developers build increasingly sophisticated agents on Cloudflare, one of the biggest challenges they face is getting the right information into context at the right time. The quality of results produced by models is directly tied to the quality of context they operate with, but even as context window sizes grow past one million (1M) tokens, context rot remains an unsolved problem. A natural tension emerges between two bad options: keep everything in context and watch quality degrade, or aggressively prune and risk losing information the agent needs later.

Today we're announcing the private beta of Agent Memory, a managed service that extracts information from agent conversations and makes it available when it’s needed, without filling up the context window.

It gives AI agents persistent memory, allowing them to recall what matters, forget what doesn't, and get smarter over time. In this post, we’ll explain how it works — and what it can help you build.

The state of agentic memory

Agentic memory is one of the fastest-moving spaces in AI infrastructure, with new open-source libraries, managed services, and research prototypes launching on a near-weekly basis. These offerings vary widely in what they store, how they retrieve, and what Continue reading

Technology Short Take 194

This is Technology Short Take #194, the latest in my semi-regular series of posts sharing data center technology-related links and articles from around the web. What have I gathered for readers this time? Key topics in Tech Short Take 194 include a new cryptography-based network stack, a look at why folks are revolting against AI, and an in-depth comparison of microVM technologies. On to the content!

Networking

Security

Your AI Agents Are Autonomous. But Are They Accountable?

Why accountability, not capability, is the real bottleneck for enterprise agentic AI, and what security leaders need to do about it before regulators force the issue.

Every enterprise is building AI agents. Marketing has one summarizing campaign performance. Engineering has one triaging incidents. Customer support has one resolving tickets. Finance has one processing invoices. And increasingly, those agents are talking to each other: calling tools, accessing databases, delegating tasks across complex multi-hop chains.

But here’s the question nobody wants to hear at 3 a.m. when something goes wrong: who authorized that action, what policy permitted it, and what’s the full chain of events?

For most enterprises, the honest answer is: nobody knows. That’s not a governance problem — it’s an AI agent accountability crisis.

Agents Are Scaling Faster Than Governance

The data paints a stark picture. McKinsey research found that 80% of organizations have already encountered risky behavior from AI agents. These actions were unintended, unauthorized, or outside acceptable guardrails. Yet only about one-third of organizations report meaningful governance maturity. Gartner predicts that over 40% of agentic AI projects will be canceled by the end of 2027 due to escalating costs, unclear business value, or inadequate risk controls.

This Continue reading

Lab: Integrated Routing and Bridging (IRB) with EVPN MAC-VRF Instances

Most EVPN documentation rushes into the complexities of symmetric IRB, type-5 EVPN routes, and EVPN IP-VRFs, but if you want to master a technology, it’s better to take it slow and proceed with small steps. In today’s lab exercise, we’ll do just that: explore what happens when you add IP addresses (we’ll use anycast gateways) to VLAN interfaces tied to EVPN MAC-VRF instances.

Deployed Is Not the Same as Ready: How Mature Is Your Kubernetes Environment?

Kubernetes adoption is no longer the challenge it once was. More than 82% of enterprises run containers in production, most of them on multiple Kubernetes clusters. Adoption, however, does not mean operational maturity. These are two very different things. It is one thing to deploy workloads to a cluster or two and quite another to do it securely, efficiently and at scale.

This distinction matters because the gap between adoption and Kubernetes operational maturity is where risk accumulates. Operationally mature organizations ship faster, recover from incidents in minutes instead of hours and consistently pass compliance audits. They spend less time dealing with outages and more time delivering new services to their customers.

So what separates maturity from adoption? It comes down to a handful of foundational capabilities that, when done well, result in measurable business impact. Operational maturity — the ability to run Kubernetes workloads securely, efficiently, and at scale, with consistent policy enforcement, cross-cluster observability, and automated incident recovery — is not a destination; it is a continuous process of strengthening the architectural pillars that keep your Kubernetes environment production-ready.

What does operational maturity look like?

Operational maturity spans several interconnected areas from Kubernetes security best practices to observability Continue reading