Kubernetes has come a long way since its debut in 2014. It’s gone from running a couple of containerized microservices to orchestrating fleets of production workloads spanning everything from AI agents to full scale VMs running in pods. As Kubernetes adoption grows, and its use cases stretch to cover more ground, managing its increasingly complex networking and security landscape demands operational maturity and a platform that supports it.
The Spring 2026 release of Calico provides that support in two key areas:
Unified operations across Kubernetes pods and VMs
Tony Mattke built several networking-focused CLI tools and released them on GitHub. You might find them useful.
Cloudflare's core is the centralized data centers that run our control plane, billing, and analytics — distinct from the globally distributed edge that handles user traffic. Core servers are bare metal, and when issues happen during reboot, the consequences can cascade fast.
Their boot sequence is orchestrated by UEFI, the modern firmware standard that initializes hardware and hands off control to the operating system. Small quirks in that handoff can have outsized consequences.
After a routine firmware update, some of our core servers were taking four hours to come back online, rather than just minutes as they did before. What should have been a one-day fleet-wide rollout was stretching into multi-day slogs. New nodes faced the full timeout gauntlet on their very first boot. Maintenance windows ballooned. Engineering teams had to babysit upgrades that should have run unattended.
The behavior we saw was brought to light when we were bringing nodes online that had been powered off for an extended period. These nodes’ firmware was out of date and required multiple updates to resolve. Combine this with recent updates to the boot protocols used by servers in some of our locations, and boot times on the affected Continue reading
After the simple SR-MPLS demo and the dual-stack SR-MPLS setup, it was time for the next obvious question: Does SR-MPLS work over unnumbered IPv4 interfaces1, assuming the implementation of the underlying routing protocol supports them? Of course it does; let’s go through the details, using the same topology I used throughout the Segment Routing workshop @ ITNOG10.
If you advertise routes into the default free zone (or global Internet), you might struggle with seeing and understanding what they look like “on the other side.” While there are many manual tools to help operators with this process, bgproutes.io gives you visibility in the global routing table through interfaces like BMP. Listen to this episode of the Hedge to learn more.
You can find bgprotues.io here.
download
Here’s a short glimpse into the history of telecommunications: in a building at the top of this mountain (barely noticeable blip across the saddle from the radio tower; search for Capo Figari for more details), Guglielmo Marconi conducted experiments in the ~1930s (after inventing the wireless telegraph system in the late 1890s).
The original radio could “transmit” at most 40-60 words per minute (the limit of a skilled Morse Code operator). 130 years later, I’m writing this blog post using a 200 Mbps Internet connection via a low-earth-orbit satellite with response times low enough that I can run an interactive SSH session with no noticeable delay. It’s almost incomprehensible how far we’ve come in such a short time.
Cloudflare processes more than a billion events every second. Our network spans 330+ cities in 120+ countries. Behind every HTTP request, every Worker invocation, every R2 read operation, there is data, and a lot of it.
For years, that data was not very easy to access. It lived in dozens of production databases, ClickHouse clusters, Kafka streams, Google Cloud buckets, BigQuery datasets, and a long tail of pipelines. To answer a simple question like "How many domains that signed up today are in the Top 100 by traffic?", an analyst at Cloudflare had to know which system to ask, what credentials to use, what query language to write, and whether the data they were looking at was sampled, fresh, or seven-days stale. As a result, it was difficult to glean informed insights from the data.
To solve this problem, we built two in-house tools: Town Lake, Cloudflare's unified data analytics platform, and Skipper, an AI data agent that runs on top of it. Town Lake is a single SQL interface to everything Cloudflare knows, and Skipper is how anyone at Cloudflare can ask questions in plain English and get correct, auditable answers back in seconds.
This is the story Continue reading
Doug Madory wrote an interesting article (published on APNIC blog) arguing that we shouldn’t worry about ephemeral BGP leaks that can be observed only during the BGP path hunting process that follows a route withdrawal.
I have to disagree with that. It’s never a good idea to ignore a dead canary in the coal mine.
While the ephemeral leaks do not impact the end result (after all, the route is gone), they are an important indicator of the lack of BGP route policy enforcement in the autonomous systems that propagate them. If an autonomous system is propagating a bogus route when no better routes are available, it’s equally likely to propagate a bogus route when an intruder manages to inject it.