Archive

Category Archives for "Networking"

Serverless Statusphere: a walk through building serverless ATProto applications on Cloudflare’s Developer Platform

Social media users are tired of losing their identity and data every time a platform shuts down or pivots. In the ATProto ecosystem — short for Authenticated Transfer Protocol — users own their data and identities. Everything they publish becomes part of a global, cryptographically signed shared social web. Bluesky is the first big example, but a new wave of decentralized social networks is just beginning. In this post I’ll show you how to get started, by building and deploying a fully serverless ATProto application on Cloudflare’s Developer Platform.

Why serverless? The overhead of managing VMs, scaling databases, maintaining CI pipelines, distributing data across availability zones, and securing APIs against DDoS attacks pulls focus away from actually building.

That’s where Cloudflare comes in. You can take advantage of our Developer Platform to build applications that run on our global network: Workers deploy code globally in milliseconds, KV provides fast, globally distributed caching, D1 offers a distributed relational database, and Durable Objects manage WebSockets and handle real-time coordination. Best of all, everything you need to build your serverless ATProto application is available on our free tier, so you can get started without spending a cent. You can find the code in Continue reading

Building Jetflow: a framework for flexible, performant data pipelines at Cloudflare

The Cloudflare Business Intelligence team manages a petabyte-scale data lake and ingests thousands of tables every day from many different sources. These include internal databases such as Postgres and ClickHouse, as well as external SaaS applications such as Salesforce. These tasks are often complex and tables may have hundreds of millions or billions of rows of new data each day. They are also business-critical for product decisions, growth plannings, and internal monitoring. In total, about 141 billion rows are ingested every day.

As Cloudflare has grown, the data has become ever larger and more complex. Our existing Extract Load Transform (ELT) solution could no longer meet our technical and business requirements. After evaluating other common ELT solutions, we concluded that their performance generally did not surpass our current system, either.

It became clear that we needed to build our own framework to cope with our unique requirements — and so Jetflow was born. 

What we achieved

Over 100x efficiency improvement in GB-s:

  • Our longest running job with 19 billion rows was taking 48 hours using 300 GB of memory, and now completes in 5.5 hours using 4 GB of memory

  • We estimate that ingestion of Continue reading

Ultra Ethernet: Reinventing X.25

One should never trust the technical details published by the industry press, but assuming the Tomahawk Ultra puff piece isn’t too far off the mark, the new Broadcom ASIC (supposedly loosely based on emerging Ultra Ethernet specs):

  1. Uses Optimized Ethernet Header, replacing IP/UDP header with a 10-byte something (let’s call it session identifier)
  2. Makes Ethernet lossless with hop-by-hop retransmission/error recovery
  3. Uses credit-based flow control (the receiver continuously updates the sender about the amount of available space)

If you’re ancient enough, you might recognize #3 as part of Fibre Channel, #2 and #3 as part of IEEE 802.1 LLC2 (used by IBM to implement SNA over Token Ring and Ethernet), and all three as the fundamental ideas of X.25 that Broadcom obviously reinvented at 800 Gbps speeds, proving (yet again) RFC 1925 Rule 11.

Don’t Let AI Make You Circuit City

I have a little confession. Sometimes I like to go into Best Buy and just listen. I pretend to be shopping or modem bearings or a left handed torque wrench. What I’m really doing is hearing how people sell computers. I remember when 8x CD burners were all the rage. I recall picking one particular machine because it had an integrated Sound Blaster card. Today, I just marvel at how the associates rattle off a long string of impressive sounding nonsense that consumers will either buy hook, line, and sinker or refute based on some Youtube reviewer recommendation. Every once in a while, though, I hear someone that actually does understand the lingo and it is wonderful. They listen and understand the challenges and don’t sell a $3,000 gaming computer to a grandmother just to play Candy Crush and look up grandkid photos on Facebook.

The Experience Matters

What does that story have to do with the title of this post? Well, dear young readers, you may not remember the time when Best Buy Blue was locked in mortal competition with Circuit City Red. In a time before Amazon was ascendant you had to pick between the two giants of Continue reading

Cloudflare protects against critical SharePoint vulnerability, CVE-2025-53770

On July 19, 2025, Microsoft disclosed CVE-2025-53770, a critical zero-day Remote Code Execution (RCE) vulnerability. Assigned a CVSS 3.1 base score of 9.8 (Critical), the vulnerability affects SharePoint Server 2016, 2019, and the Subscription Edition, along with unsupported 2010 and 2013 versions. Cloudflare’s WAF Managed Rules now includes 2 emergency releases that mitigate these vulnerabilities for WAF customers.

Unpacking CVE-2025-53770

The vulnerability's root cause is improper deserialization of untrusted data, which allows a remote, unauthenticated attacker to execute arbitrary code over the network without any user interaction. Moreover, what makes CVE-2025-53770 uniquely threatening is its methodology – the exploit chain, labeled "ToolShell." ToolShell is engineered to play the long-game: attackers are not only gaining temporary access, but also taking the server's cryptographic machine keys, specifically the ValidationKey and DecryptionKey. Possessing these keys allows threat actors to independently forge authentication tokens and __VIEWSTATE payloads, granting them persistent access that can survive standard mitigation strategies such as a server reboot or removing web shells.

In response to the active nature of these attacks, the U.S. Cybersecurity and Infrastructure Security Agency (CISA) added CVE-2025-53770 to its Known Exploited Vulnerabilities (KEV) catalog with an emergency remediation deadline. Continue reading

Shutdown season: the Q2 2025 Internet disruption summary

Cloudflare’s network currently spans more than 330 cities in over 125 countries, and we interconnect with over 13,000 network providers in order to provide a broad range of services to millions of customers. The breadth of both our network and our customer base provides us with a unique perspective on Internet resilience, enabling us to observe the impact of Internet disruptions at both a local and national level, as well as at a network level.

As we have noted in the past, this post is intended as a summary overview of observed and confirmed disruptions, and is not an exhaustive or complete list of issues that have occurred during the quarter. A larger list of detected traffic anomalies is available in the Cloudflare Radar Outage Center. Note that both bytes-based and request-based traffic graphs are used within the post to illustrate the impact of the observed disruptions — the choice of metric was generally made based on which better illustrated the impact of the disruption.

In our Q1 2025 summary post, we noted that we had not observed any government-directed Internet shutdowns during the quarter. Unfortunately, that forward progress was short-lived — in the second quarter of 2025, we Continue reading

Always Check Your Tests Against Faulty Inputs

A while ago, I published a blog post proudly describing the netlab integration test that should check for incorrect OSPF network types in netlab-generated device configurations. Almost immediately, Erik Auerswald pointed out that my test wouldn’t detect that error (it might detect other errors, though) as the OSPF network adjacency is always established even when the adjacent routers have mismatching OSPF network types.

I made one of the oldest testing mistakes: I checked whether my test would work under the correct conditions but not whether it would detect an incorrect condition.

How the Free Software Foundation Battles the LLM Bots

A Ian Kelling points out that the infrastructure for the Free Software Foundation “has been under attack since August 2024.” “Nothing has changed since the article,” FSF sysadmin a report from LibreNews noting similar issues at high-profile FOSS sites including the Fedora project, KDE GitLab infrastructure, the GNOME GitLab instance, Diaspora, and even the FOSS news site Linux Weekly News. (And “GNOME has been experiencing issues since a last November…”) Articles like the FSF’s are a way of sharing “techniques and tools”, McMahon said Tuesday. Though he adds that some system administrators also have a private mailing list “where we can coordinate and share effective strategies. The specific mitigations often cannot be published because that would give our attackers an advantage.” There’s a lot to learn from the FSF’s battle against the bots — about the tactics of sysadmins, but also about Continue reading

Immich Setup with Docker & External Library (NFS)

Immich Setup with Docker & External Library (NFS)

Recently, I started self-hosting most of the apps I use, like Memos for note-taking and Paperless-NGX for document management. The next one on the list was Immich. Immich is a self-hosted photo and video backup solution that supports features like facial recognition and automatic uploads.

Memos - Amazing Open Source, Self-hosted Notes App
That being said, I recently stumbled upon another great self-hosted note-taking app called ‘Memos’ I just couldn’t believe that I didn’t know about this until very recently.
Immich Setup with Docker & External Library (NFS)
Paperless-ngx - Self-Hosted Document Manager
I came across a great self-hosted document manager called ‘Paperless-NGX’. It not only helps with organising documents but also includes OCR functionality
Immich Setup with Docker & External Library (NFS)

In this post, we’ll look at how to set up Immich as a Docker container and also how to add an NFS share as an external library.

But, Why?

I have a lot of pictures on my NAS that I’ve collected over the years. This includes photos of friends, family, and ones from my older phones. I wanted a way to manage and organise them from one place. I also didn’t want to upload all of them to Google or Apple, which would cost quite a bit. Continue reading

Cisco IOS/XE Hates Redistributed Static IPv6 Routes

Writing tests that check the correctness of network device configurations is hard (overview, more details). It’s also an interesting exercise in getting the timing just right:

  • Routing protocols are an eventually-consistent distributed system, and things eventually appear in the right place (if you got the configurations right), but you never know when exactly that will happen.
  • You can therefore set some reasonable upper bounds on when things should happen, and declare failure if the timeouts are exceeded. Even then, you’ll get false positives (as in: the test is telling you the configurations are incorrect, when it’s just a device having a bad hair day).

And just when you think you nailed it, you encounter a device that blows your assumptions out of the water.

Quicksilver v2: evolution of a globally distributed key-value store (Part 2)

What is Quicksilver?

Cloudflare has servers in 330 cities spread across 125+ countries. All of these servers run Quicksilver, which is a key-value database that contains important configuration information for many of our services, and is queried for all requests that hit the Cloudflare network.

Because it is used while handling requests, Quicksilver is designed to be very fast; it currently responds to 90% of requests in less than 1 ms and 99.9% of requests in less than 7 ms. Most requests are only for a few keys, but some are for hundreds or even more keys.

Quicksilver currently contains over five billion key-value pairs with a combined size of 1.6 TB, and it serves over three billion keys per second, worldwide. Keeping Quicksilver fast provides some unique challenges, given that our dataset is always growing, and new use cases are added regularly.

Quicksilver used to store all key-values on all servers everywhere, but there is obviously a limit to how much disk space can be used on every single server. For instance, the more disk space used by Quicksilver, the less disk space is left for content caching. Also, with each added server that contains a particular Continue reading

Explore your Cloudflare data with Python notebooks, powered by marimo

Many developers, data scientists, and researchers do much of their work in Python notebooks: they’ve been the de facto standard for data science and sharing for well over a decade. Notebooks are popular because they make it easy to code, explore data, prototype ideas, and share results. We use them heavily at Cloudflare, and we’re seeing more and more developers use notebooks to work with data – from analyzing trends in HTTP traffic, querying Workers Analytics Engine through to querying their own Iceberg tables stored in R2.

Traditional notebooks are incredibly powerful — but they were not built with collaboration, reproducibility, or deployment as data apps in mind. As usage grows across teams and workflows, these limitations face the reality of work at scale.

marimo reimagines the notebook experience with these challenges in mind. It’s an open-source reactive Python notebook that’s built to be reproducible, easy to track in Git, executable as a standalone script, and deployable. We have partnered with the marimo team to bring this streamlined, production-friendly experience to Cloudflare developers. Spend less time wrestling with tools and more time exploring your data.

Today, we’re excited to announce three things:

Demystifying Ultra Ethernet

The Ultra Ethernet Consortium (UEC), of which Arista is a founding member, is a standards organisation established to enhance Ethernet for the demanding requirements of Artificial Intelligence (AI) and High-Performance Computing (HPC). Over 100 member companies and 1000 participants have collaborated to evolve Ethernet, leading to the recent publication of its 1.0 specification, which will drive hardware implementations that significantly boost cluster performance.

Cloudflare 1.1.1.1 incident on July 14, 2025

On 14 July 2025, Cloudflare made a change to our service topologies that caused an outage for 1.1.1.1 on the edge, resulting in downtime for 62 minutes for customers using the 1.1.1.1 public DNS Resolver as well as intermittent degradation of service for Gateway DNS.

Cloudflare's 1.1.1.1 Resolver service became unavailable to the Internet starting at 21:52 UTC and ending at 22:54 UTC. The majority of 1.1.1.1 users globally were affected. For many users, not being able to resolve names using the 1.1.1.1 Resolver meant that basically all Internet services were unavailable. This outage can be observed on Cloudflare Radar.

The outage occurred because of a misconfiguration of legacy systems used to maintain the infrastructure that advertises Cloudflare’s IP addresses to the Internet.

This was a global outage. During the outage, Cloudflare's 1.1.1.1 Resolver was unavailable worldwide.

We’re very sorry for this outage. The root cause was an internal configuration error and not the result of an attack or a BGP hijack. In this blog, we’re going to talk about what the failure was, why it occurred, and what we’re doing to Continue reading

Cloudflare recognized as a Visionary in 2025 Gartner® Magic Quadrant™ for SASE Platforms

We are thrilled to announce that Cloudflare has been named a Visionary in the 2025 Gartner® Magic Quadrant™ for Secure Access Service Edge (SASE) Platforms1 report. We view this evaluation as a significant recognition of our strategy to help connect and secure workspace security and coffee shop networking through our unique connectivity cloud approach. You can read more about our position in the report here.

Since launching Cloudflare One, our SASE platform, we have delivered hundreds of features and capabilities from our lightweight branch connector and intuitive native Data Loss Prevention (DLP) service to our new secure infrastructure access tools. By operating the world’s most powerful, programmable network we’ve built an incredible foundation to deliver a comprehensive SASE platform. 

Today, we operate the world's most expansive SASE network in order to deliver connectivity and security close to where users and applications are, anywhere in the world. We’ve developed our services from the ground up to be fully integrated and run on every server across our network, delivering a unified experience to our customers. And we enable these services with a unified control plane, enabling end-to-end visibility and control anywhere in the world. Tens of thousands of customers Continue reading

Dry Run: Your Kubernetes network policies with Calico staged network policies

Kubernetes Network Policies (KNP) are powerful resources that help secure and isolate workloads in a cluster. By defining what traffic is allowed to and from specific pods, KNPs provide the foundation for zero-trust networking and least-privilege access in cloud-native environments.

But there’s a problem: KNPs are risky, and applying them without a clear game plan can be potentially disruptive.

Without deep insight into existing traffic flows, applying a restrictive policy can instantly break connectivity killing live workloads, user sessions, or critical app dependencies. An even scarier scenario is when we implement policies that we think cover everything and workloads actually work, but after a restart or scaling operation we hit new problems. Kubernetes, with all of its features, has no built-in “dry run” mode for policies, and no first-class observability to show what would be blocked or allowed which is the right decision since Kubernetes is an orchestrator not an implementer.

This forces platform teams into a difficult choice, deploy permissive or no policies and weaken security, or Risk service disruption while debugging restrictive ones. As a result, many teams delay implementing network policies entirely only to regret it after a zero-day exploit like Log4Shell, XZ backdoor, or other vulnerabilities Continue reading