Cloudflare processes more than a billion events every second. Our network spans 330+ cities in 120+ countries. Behind every HTTP request, every Worker invocation, every R2 read operation, there is data, and a lot of it.
For years, that data was not very easy to access. It lived in dozens of production databases, ClickHouse clusters, Kafka streams, Google Cloud buckets, BigQuery datasets, and a long tail of pipelines. To answer a simple question like "How many domains that signed up today are in the Top 100 by traffic?", an analyst at Cloudflare had to know which system to ask, what credentials to use, what query language to write, and whether the data they were looking at was sampled, fresh, or seven-days stale. As a result, it was difficult to glean informed insights from the data.
To solve this problem, we built two in-house tools: Town Lake, Cloudflare's unified data analytics platform, and Skipper, an AI data agent that runs on top of it. Town Lake is a single SQL interface to everything Cloudflare knows, and Skipper is how anyone at Cloudflare can ask questions in plain English and get correct, auditable answers back in seconds.
This is the story Continue reading
In The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders, we walked through each pillar of AI agent accountability (traceability, authorization provenance, identity and ownership, policy at scale, and human oversight) and argued that most enterprises today sit at Level 0 or Level 1 of the Accountability Maturity Model.
The most common reaction we get when we share that framework is some version of: “We’re already covered. We have network policies. We have an API gateway. We have RBAC.”
This article is for that reaction.
Enterprises aren’t starting from zero. Most have invested in security, networking, and identity infrastructure that works well for traditional workloads. The problem isn’t a lack of tools. It’s that existing tools were designed for model outputs, not autonomous actions; a world where services are deterministic, communication patterns are predictable, and humans make all the decisions.
Agentic AI breaks every one of those assumptions. Here’s where the most common approaches each leave a critical accountability gap.
Kubernetes Network Policies are essential for securing any cluster. They restrict which pods can communicate with which other pods at the network level, and they should absolutely Continue reading
On Tuesday, May 26, Iran’s vice president announced that Internet access had started to be restored in the country after being cut off almost three months ago, following the launch of U.S. and Israeli attacks on February 28.
Cloudflare Radar data confirms increased activity and indicates a partial restoration of the Internet in Iran. In this blog post, we’ll examine a range of data points that provide a lens into this prolonged shutdown – and the signs that Iran’s citizens are increasingly able to connect once again. As the situation continues to unfold, Radar will have the latest data on Iran’s connectivity.
Iranian citizens have experienced two national Internet shutdowns this year. The first began on January 8 around 16:30 UTC (20:00 local time), and we explored the impact seen over the first few days in a blog post. Traffic from Iran remained near zero until January 21, when a small amount of traffic returned, only to disappear a little over 24 hours later. A similar brief restoration also occurred on January 25, before traffic recovered more fully beginning on January 27.
In late February, as military strikes on Iran escalated, a second Continue reading

Imagine yourself walking down a country lane, lush green grass around you, no farm animals anywhere, when suddenly you see a fence right in the middle of the path. You think, now, that’s a bit silly, that fence is blocking the path, somebody should have this fence removed. And by thinking that you’d fall right into the predicament known as Chesterton’s Fence. That is, you see something that you instinctively feel does not belong and you want to remove it. And perhaps that is exactly what needs to be done, but not before you ask a very important question, “why”? Why is the fence here? What function does it serve? Who put it there? What were they trying to achieve?

In any complex system, and most of the systems we work with these days are complex, problems often arise as a result of relationships and interactions between components. Our systems contain many components, some with special optimizations, some acting as local stabilizers, that might appear inefficient and unintuitive. Other components, or parts of the system seem to serve no apparent purpose at all.
Any given component is usually self-contained and can be understood, reasoned about, modified and improved by one Continue reading
Two platforms, two teams, two procurement relationships, all doing one job. There’s a reason it ended up this way. There isn’t a reason it has to stay this way.
Ask anyone at a typical enterprise why the VM platform and the container platform are separate, and they’ll give you a sensible answer. The VM estate has been there for fifteen years. It runs the workloads the business depends on. Kubernetes got stood up later, when application teams started building microservices, and giving them their own environment made more sense than retrofitting one onto VMware. Two platforms, two teams, two roadmaps.
That’s how most enterprises got here.
The reasoning was sound at the time. The question is whether it still is.
This is the consolidation question most enterprises haven’t actually revisited, and it’s the one quietly absorbing more of your budget each year.

If you operate both platforms, you know the shape of this already. There’s a VMware team: vSphere admins, network engineers who know NSX, storage specialists, plus a separate procurement relationship for the underlying virtualisation stack. Then there’s a Kubernetes team: platform Continue reading
Did you manage not to stumble on a dramatic post explaining how someone generated 10,000 lines of code with AI while wasting time on your LinkedIn feed? Congratulations, you’re lucky.
However, as Nathaniel Fishel explained in his Your Code Is Worthless article, the “lines of code” is a useless vanity metric that sounds great in a LinkedIn self-promotion, but doesn’t matter when one has to maintain the product one has shipped to the customers. Add the natural laziness, and you have a perfect storm. As he wrote:
This is just a short followup to the last RustRadio post. If you came for more rants about C, you’ll be disappointed.
I’ve never been that interested in writing UI code, including HTML. You can see the “programmer art” in the screenshots linked from www.habets.pp.se.
And then the slightly different tech section, that doesn’t serve much of a purpose now that we have github.
I’ve not been happier with GTK, QT, and the others either.
But [RustRadio][rustradio] needs a UI.
I feel like the browser is the most stable and portable UI. So I’d already decided on that. So now I have to manually do a bunch of DOM manipulation, to create an interactive UI? Or worse, learn the React/Angular/Whatever flavor of the day, that will be obsolete by next afternoon? Gag me with a spoon.
For now I’m just continuing to focus on the SDR and architectural parts of RustRadio, and I’m letting the LLM-written code do the HTML manipulation.
Yeah, it’s kinda vibe coding. But doesn’t use unsafe, and it demonstrably
outputs what I want. (I mean, sure it may require some follow-up prompts), so
who cares?
The Continue reading
Practically no one runs a single Kubernetes cluster in production these days. Maybe that’s how it started but data sovereignty requirements, acquisitions, AI initiatives and the need for edge servers, among other considerations, have pulled most enterprises into multi-cluster territory whether they planned for it or not. Reaching Kubernetes operational maturity—the point at which a fleet of clusters operates as one secure, observable, policy-consistent system—depends entirely on how those clusters are connected. Operating in a multi-cluster environment has evolved into the unspoken standard, one requiring a careful re-evaluation of the network architectures used to link clusters together.
That re-evaluation rarely happens. Most enterprises connect their clusters with the same networking patterns they were using before Kubernetes existed: load balancers fronting internal services, DNS records published to external zones, and IP-based firewall rules. Those patterns were built for north-south traffic moving in and out of a traditional data center perimeter, not for east-west traffic moving between internal workloads.
The conventional way to make services in one cluster reachable from another is to expose them externally with a load balancer in front, a DNS name registered in a public zone, a firewall rule allowing traffic in. Continue reading
SONiC
is a vendor-neutral, Linux-based network operating system (NOS) that uses a
database-driven architecture. Its software components run in multiple
containers and exchange information through Redis. In SONiC, several named
databases are defined for different functions, and these databases are mapped
to Redis logical database IDs. Through this design, configuration data,
application state, operational state, and ASIC-related state move between
software layers by means of specialized processes.
Different
hardware vendors may add their own platform integrations, transceiver support,
monitoring utilities, or management workflows. However, the core SONiC
architecture remains the same. This is one of the main reasons why SONiC
knowledge, troubleshooting methods, and automation practices are transferable
across different hardware platforms.
Vendor
neutrality does not mean that every SONiC-based implementation behaves exactly
the same in every operational detail. It means that different implementations
follow the same architectural model. To organize information clearly, SONiC
defines several named databases, each of which is mapped to a Redis logical
database ID:
·
CONFIG_DB (Redis DB 4):
Stores the user’s intended configuration.
·
APPL_DB (Redis DB 0):
Stores application-level objects that are ready for processing by lower
software layers.
· STATE_DB (Redis DB 6): Stores operational state information about system Continue reading
To associate routing information—like AS paths or BGP communities—to flows, Akvorado can import routes through the BGP Monitoring Protocol (BMP). As the Internet routing table contains more than 1 million routes, Akvorado needs to scale to tens of millions of routes.1 This has been a long-standing challenge,2 but I expect this issue is now fixed by using RIB sharding, a method that splits the routing database into several parts to enable concurrent updates.
Akvorado connects 2 elements to build its RIB:
In the diagram above, the RIB stores five IPv4 prefixes and two IPv6 prefixes.
One of them, 2001:db8:1::/48, contains three routes:
2001:db8::3:1, AS 65402, AS path 65402, community
65402:31,2001:db8::4:1, same ASN, AS path, and community,2001:db8::5:1, AS 65402, AS path 65401 65402 Continue reading