Over the past two and a bit quarters, we've undertaken an intensive engineering effort, internally code-named "Code Orange: Fail Small", focused on making Cloudflare's infrastructure more resilient, secure, and reliable for every customer.
Earlier this month, the Cloudflare team finished this work.
While improving resiliency will never be a “job done” and will always be a top priority across our development lifecycle, we have now completed the work that would have avoided the November 18, 2025 and December 5, 2025 global outages.
This work focused on several key areas: safer configuration changes, reducing the impact of failure, and revising our “break glass” procedures and incident management. We also introduced measures to prevent drift and regressions over time, and strengthened the way we communicate to our customers during an outage.
Here we explain in depth what we shipped, and what it means for you.
What it means for you: In most cases, Cloudflare internal configuration changes no longer reach our network instantly and are instead rolled out progressively with real-time health monitoring. This allows our observability tools to catch problems and revert issues before they affect your traffic.
In order to catch potentially dangerous deployments Continue reading
When we first launched Workers eight years ago, it was a direct-to-developers platform. Over the years, we have expanded and scaled the ecosystem so that platforms could not only build on Workers directly, but they could also enable their customers to ship code to us through many multi-tenant applications. We now see on Workers: Applications where users describe what they want, and the AI writes the implementation. Multi-tenant SaaS where every customer's business logic is, at runtime, some TypeScript the platform has never seen before. Agents that write and run their own tools. CI/CD products where every repo defines its own pipeline.
Last month, when we shipped the Dynamic Workers open beta, we gave those platforms a clean primitive for the compute side: hand the Workers runtime some code at runtime, get back an isolated, sandboxed Worker, on the same machine, in single-digit milliseconds. Durable Object Facets extended the same idea to storage — each dynamically-loaded app can have its own SQLite database, spun up on demand, with the platform sitting in front, as a supervisor. Artifacts did the same for source control: a Git-native, versioned filesystem you can create by the tens of millions, one per agent, Continue reading
While more than two-thirds of human-generated TLS traffic to Cloudflare is already protected by post-quantum cryptography, the world of site-to-site networking has been a different story. For years, the IPsec community remained caught between the high bar of Internet-scale interoperability and the niche requirements of specialized hardware. That gap is now closing.
Earlier this month, we announced that Cloudflare has moved its target for full post-quantum security forward to 2029, spurred by several recent advances in quantum computing. To advance that goal, we’ve made post-quantum encryption in Cloudflare IPsec generally available.
Using the new IETF draft for hybrid ML-KEM (FIPS 203), we’ve successfully tested interoperability with branch connectors from Fortinet and Cisco — meaning you can start protecting your wide-area network (WAN) against harvest-now-decrypt-later attacks today using hardware you already have.
This post explains how we implemented the new hybrid IPsec handshake, why it took four years longer to land than its TLS counterpart, and how the industry is finally consolidating around a standard that works at Internet scale.
Cloudflare IPsec is a WAN Network-as-a-Service that replaces legacy network architectures by connecting data centers, branch offices, and cloud VPCs to Cloudflare's global IP Anycast Continue reading
Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app — an account, a way to pay, and an API token. Until now these have been tasks that humans handle directly. Increasingly, agents handle them on the user’s behalf. The agent needs to perform all the tasks a human customer can. They’re given higher-order problems to solve and choose to use Cloudflare and call Cloudflare APIs.
Starting today, agents can provision Cloudflare on behalf of their users. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission, but no human steps are required from start to finish. There’s no need to go to the dashboard, copy and paste API tokens, or enter credit card details. Without any extra setup, agents have everything they need to deploy a new production application in one shot. And with Cloudflare’s Code Mode MCP server and Agent Skills, they’re even better at it.
This all works via a new protocol that we’ve co-designed with Stripe as part Continue reading
At ITNOG 10, I’ve seen something that I haven’t seen in a very long time: a mini-Interop-style physical lab using a dozen devices from different vendors. The network core was a leaf-and-spine fabric with off-path BGP route reflectors and numerous other devices attached to it.
I’ve configured a few networks in the past, so I know it must have been a beast to configure all those devices by hand (and fix all the IP addressing errors), but then a thought struck me: unless one wants to practice configuring IP addresses, it might be a good idea to use netlab to generate the IP addressing plan and partial device configurations.

Pytest is a Python testing framework. It is primarily used by developers to test their code and make sure it behaves as expected. For example, if you write a function that adds two numbers, you can write a test to verify that the function returns the correct result. If it does, the test passes. If not, the test fails, and pytest tells you exactly where things went wrong.
That is the traditional use case, but pytest is not limited to testing code. You can use it to test anything that can be scripted in Python, and that includes testing your network.
In this series, we will use pytest to write tests that connect to network devices and verify their state. For example, we can write a test that connects to a router and checks whether BGP is up. If BGP is up, the test passes. If not, the test fails. We can also check things like interface states, routing table entries, OSPF neighbours, or really anything else you can pull from a device.
Here is what nobody putting together the business case for a VM migration to Kubernetes will tell you upfront: the compute is the easy part.
Moving workloads off vSphere and onto Kubernetes is conceptually straightforward. The tooling has matured. The architecture is proven. Compute moves, storage remaps, and the platform team has a plan.
The network is where projects quietly stall.
Not because the technology does not work. Because nobody scoped the network properly before the project started. A platform migration turned into a multi-team coordination exercise. The firewall team needed a change window. The security team needed to review a network placement that changed when it should not have needed to. The application team discovered hardcoded IPs that nobody documented.
Six months later, half the VMs are still on vSphere and the project is technically “in progress.”
This is not a skills gap. It happens at the most mature organisations with capable teams. It is a scoping problem, and it has a specific cause: the gap between how VM networking works and how Kubernetes networking works is wider than it looks on a migration plan.
This post is for the people who approve these projects. Here is what Continue reading
This chapter explains how to build
a SONiC virtual test environment on a Windows computer. First, we enable the
required Windows features for WSL 2 and update and verify the WSL installation.
Next, we install an Ubuntu distribution and validate that the Linux environment
is working correctly, including basic resource checks (CPU, memory, and disk).
After the Linux environment is ready, we install Docker Engine from Docker’s
official repository and complete the required post-installation steps to run
containers. We then install Containerlab, download the SONiC virtual switch
image (docker-sonic-vs.gz), copy it into WSL, and load it into Docker.
Finally, we install Visual Studio Code on Windows and connect it to WSL to make
creating and editing the YAML topology files easier. The next chapter uses this
environment to define and deploy a simple SONiC-based topology.
WSL 2 requires two Windows
features to be enabled. The first feature, Microsoft-Windows-Subsystem-Linux
(Example 1-1), enables WSL. The second feature, VirtualMachinePlatform
(Example 1-2), is required to run WSL 2.
In this
example, both features are enabled using Microsoft PowerShell (Run as
Administrator) with the dism.exe command. The options used are:
Naveen Kumar Devaraj was reading my Integrated Routing and Bridging (IRB) with EVPN MAC-VRF Instances lab exercise and spotted this detail:
Arista EOS originates MAC-IP routes with and without IP addresses, effectively doubling the size of the EVPN BGP table
He kindly wrote a LinkedIn comment explaining that behavior: