Redesigning Workers KV for increased availability and faster performance

On June 12, 2025, Cloudflare suffered a significant service outage that affected a large set of our critical services. As explained in our blog post about the incident, the cause was a failure in the underlying storage infrastructure used by our Workers KV service. Workers KV is not only relied upon by many customers, but serves as critical infrastructure for many other Cloudflare products, handling configuration, authentication and asset delivery across the affected services. Part of this infrastructure was backed by a third-party cloud provider, which experienced an outage on June 12 and directly impacted availability of our KV service.

Today we're providing an update on the improvements that have been made to Workers KV to ensure that a similar outage cannot happen again. We are now storing all data on our own infrastructure. We are also serving all requests from our own infrastructure in addition to any third-party cloud providers used for redundancy, ensuring high availability and eliminating single points of failure. Finally, the work has meaningfully improved performance and set a clear path for the removal of any reliance on third-party providers as redundant back-ups.

Background: The Original Architecture

Workers KV is a global key-value store that Continue reading

Technology Short Take 187

Welcome to Technology Short Take #187! In this Technology Short Take, I have a curated collection of links on topics ranging from BGP to blade server hardware to writing notes using a “zettelkasten”-style approach, along with a few other topics thrown in here and there for fun. I hope you find something useful!

Networking

Servers/Hardware

Security

Cloud Computing/Cloud Management

  • I’ve spoken about Cedar before here on this site. The first mention of Continue reading

AWS Direct Connect Technical Deep Dive (IX)

AWS Direct Connect Technical Deep Dive (IX)

In the previous posts, we looked at how to use a site‑to‑site VPN to connect your on‑premises network to AWS, and as we saw, it is very easy to set up. So what’s the fuss about Direct Connect (DX), and why would we need one?

To give you a one‑word answer, a VPN connects through the Internet. As you would expect, that comes with some limitations. Latency can be high, and the throughput is capped at around 1.25 Gb/s (per tunnel). So what if we need something more resilient and with much higher throughput?

AWS Site-to-Site VPN
Ideally, we want to securely connect to all of our instances using their private IP addresses, just as if they were in our own data centre. This is where the AWS Site-to-Site VPN comes in.
AWS Direct Connect Technical Deep Dive (IX)

That is where AWS Direct Connect comes in. As the name suggests, it is a Dedicated Direct Connection (DX Connection) to AWS, giving you a dedicated network link with better performance and reliability compared to a traditional VPN over the Internet.

As always, if you find this post helpful, press the ‘clap’ button. It means a lot to me and helps Continue reading

TCG055: Building Developer-First Identity Solutions with Brian Pontarelli

Today we explore how to build sustainable tech companies with Brian Pontarelli, Founder of FusionAuth. Brian shares his path from early programming on an Apple IIe to creating innovative solutions in the complex world of customer identity and access management (CIAM). Brian argues that single-tenancy and local development capabilities are crucial for developers. He also... Read more »

NAN097: Automating Optical Networks

Optical networks are an essential component of networking, but don’t get much attention. Today we shine a spotlight on the intersection of optical networks and the software that automates them. Our guest is Michal Pecek, consultant and teacher in optical communication, whose work has transformed organizations including Google and Alcatel-Lucent (now Nokia). From pioneering flexible DWDM... Read more »

Congestion Control at IETF 123

As usual, IETF 123 was a busy week for DNS folk. I'll cover the material presented at the DELEG and DNSOP working groups. There is more to the DNS at IETF meetings than just these two working groups, and I'll skip over Adaptive DNS Discovery (ADD), Extensions for Scalable DNS Service Discovery (DNSSD), and DANE Authentication for Network Clients Everywhere (DANCE) in the interests of trying to keep this report (relatively) brief!

Partnering with OpenAI to bring their new open models onto Cloudflare Workers AI

OpenAI has just announced their latest open-weight models — and we are excited to share that we are working with them as a Day 0 launch partner to make these models available in Cloudflare's Workers AI. Cloudflare developers can now access OpenAI's first open model, leveraging these powerful new capabilities on our platform. The new models are available starting today at @cf/openai/gpt-oss-120b and @cf/openai/gpt-oss-20b.

Workers AI has always been a champion for open models and we’re thrilled to bring OpenAI's new open models to our platform today. Developers who want transparency, customizability, and deployment flexibility can rely on Workers AI as a place to deliver AI services. Enterprises that need the ability to run open models to ensure complete data security and privacy can also deploy with Workers AI. We are excited to join OpenAI in fulfilling their mission of making the benefits of AI broadly accessible to builders of any size.

The technical model specs

The OpenAI models have been released in two sizes: a 120 billion parameter model and a 20 billion parameter model. Both of them are Mixture-of-Experts models – a popular architecture for recent model releases – that allow relevant experts to be called for a Continue reading

PP073: Identify Yourself: Authentication From SAML to FIDO2

From SAML to OAuth to FIDO2 to passwordless promises, we unpack what’s working—and what’s broken—in the world of identity and authentication. Today on the Packet Protector podcast, we’re joined by the always thoughtful and occasionally provocative Wolf Goerlich, former Duo advisor, and now a practicing CISO in the public sector. We also talk about authorization... Read more »

How 1&1 Mail & Media Scaled Kubernetes Networking with eBPF and Calico

“We started in 2017 with Calico and never regretted it!”
—Stefan Fudeus, Product Owner/Lead Architect, 1&1 Mail & Media

Challenge

1&1 Mail & Media, part of the IONOS group, powers popular European internet brands including GMX and Web.de, serving more than 50% of Germany’s population with critical identity and email infrastructure. With roughly 45 to 50 million users, network reliability is non-negotiable. Any downtime could affect millions.

By 2022, the company had containerized 80% of its workloads on Kubernetes across three self-managed data centers. While the platform, backed by bare metal nodes and custom network layers, was highly scalable, network throughput bottlenecks began to emerge. Pods were limited to 2.5 Gbps of bandwidth due to IP encapsulation overhead, despite 10 Gbps network interfaces.

The team needed a solution that:

  • Improved pod-to-pod network performance
  • Maintained strong network policy isolation across up to 40 tenants per cluster
  • Scaled to millions of network connections and 1.4 million HTTP requests per second

Solution

1&1 Mail & Media had adopted Calico back in 2017, largely for its unique Kubernetes NetworkPolicy standard support. As their Kubernetes platform evolved, with clusters scaling to 300 bare metal nodes, 16,000 pods, and over 4 million Continue reading

IBM Outlines Steps To Verify Claims Of Quantum Advantage

D-Wave executives stirred up some controversy earlier this year when they claimed a smaller version of its Advantage 2 annealing quantum system, armed with 1,200 qubits, had reached “quantum supremacy,” – or “quantum advantage” – that significant but ill-defined time when a quantum system is able to solve a problem in much less time, at a lower cost, or more efficiently than the most powerful classical supercomputer.

IBM Outlines Steps To Verify Claims Of Quantum Advantage was written by Jeffrey Burt at The Next Platform.

Reducing double spend latency from 40 ms to < 1 ms on privacy proxy

One of Cloudflare’s big focus areas is making the Internet faster for end users. Part of the way we do that is by looking at the "big rocks" or bottlenecks that might be slowing things down — particularly processes on the critical path. When we recently turned our attention to our privacy proxy product, we found a big opportunity for improvement.

What is our privacy proxy product? These proxies let users browse the web without exposing their personal information to the websites they’re visiting. Cloudflare runs infrastructure for privacy proxies like Apple’s Private Relay and Microsoft’s Edge Secure Network.

Like any secure infrastructure, we make sure that users authenticate to these privacy proxies before we open up a connection to the website they’re visiting. In order to do this in a privacy-preserving way (so that Cloudflare collects the least possible information about end-users) we use an open Internet standard – Privacy Pass – to issue tokens that authenticate to our proxy service.

Every time a user visits a website via our Privacy Proxy, we check the validity of the Privacy Pass token which is included in the Proxy-Authorization header in their request. Before we cryptographically validate a user's token, we check Continue reading

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

We are observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences. We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity, as well as ignoring — or sometimes failing to even fetch — robots.txt files.

The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.

How we tested

We received complaints from customers who had both disallowed Perplexity crawling activity in their robots.txt files and also created WAF rules to specifically block both of Perplexity’s declared crawlers: PerplexityBot and Perplexity-User. Continue reading

NB537: Palo Alto Networks IDs New Market With $25 Billion CyberArk Buy; Intel to Shed Networking Biz

Take a Network Break! Guest opinionator Tom Hollingsworth joins Johna Johnson to opine on the latest tech news. On the vulnerability front, several versions of BentoML are open to a server side request forgery. Looking at tech news, Intel will spin out its networking and edge group as it continues cost-cutting, Palo Alto Networks makes... Read more »
1 4 5 6 7 8 3,807