Cloudflare service outage June 12, 2025

On June 12, 2025, Cloudflare suffered a significant service outage that affected a large set of our critical services, including Workers KV, WARP, Access, Gateway, Images, Stream, Workers AI, Turnstile and Challenges, AutoRAG, Zaraz, and parts of the Cloudflare Dashboard.

This outage lasted 2 hours and 28 minutes, and globally impacted all Cloudflare customers using the affected services. The cause of this outage was due to a failure in the underlying storage infrastructure used by our Workers KV service, which is a critical dependency for many Cloudflare products and relied upon for configuration, authentication and asset delivery across the affected services. Part of this infrastructure is backed by a third-party cloud provider, which experienced an outage today and directly impacted availability of our KV service.

We’re deeply sorry for this outage: this was a failure on our part, and while the proximate cause (or trigger) for this outage was a third-party vendor failure, we are ultimately responsible for our chosen dependencies and how we choose to architect around them.

This was not the result of an attack or other security event. No data was lost as a result of this incident. Cloudflare Magic Transit and Magic WAN, DNS, cache, proxy, Continue reading

Celebrating 11 years of Project Galileo’s global impact

June 2025 marks the 11th anniversary of Project Galileo, Cloudflare’s initiative to provide free cybersecurity protection to vulnerable organizations working in the public interest around the world. From independent media and human rights groups to community activists, Project Galileo supports those often targeted for their essential work in human rights, civil society, and democracy building.

A lot has changed since we marked the 10th anniversary of Project Galileo. Yet, our commitment remains the same: help ensure that organizations doing critical work in human rights have access to the tools they need to stay online.  We believe that organizations, no matter where they are in the world, deserve reliable, accessible protection to continue their important work without disruption.

For our 11th anniversary, we're excited to share several updates including:

  • An interactive Cloudflare Radar report providing insights into the cyber threats faced by at-risk public interest organizations protected under the project. 

  • An expanded commitment to digital rights in the Asia-Pacific region with two new Project Galileo partners.

  • New stories from organizations protected by Project Galileo working on the frontlines of civil society, human rights, and journalism from around the world.

Tracking and reporting on cyberattacks with the Project Galileo Continue reading

We shipped FinalizationRegistry in Workers: why you should never use it

We’ve recently added support for the FinalizationRegistry API in Cloudflare Workers. This API allows developers to request a callback when a JavaScript object is garbage-collected, a feature that can be particularly relevant for managing external resources, such as memory allocated by WebAssembly (Wasm). However, despite its availability, our general advice is: avoid using it directly in most scenarios.

Our decision to add FinalizationRegistry — while still cautioning against using it — opens up a bigger conversation: how memory management works when JavaScript and WebAssembly share the same runtime. This is becoming more common in high-performance web apps, and getting it wrong can lead to memory leaks, out-of-memory errors, and performance issues, especially in resource-constrained environments like Cloudflare Workers.

In this post, we’ll look at how JavaScript and Wasm handle memory differently, why that difference matters, and what FinalizationRegistry is actually useful for. We’ll also explain its limitations, particularly around timing and predictability, walk through why we decided to support it, and how we’ve made it safer to use. Finally, we’ll talk about how newer JavaScript language features offer a more reliable and structured approach to solving these problems.

Memory management 101

JavaScript

JavaScript relies on automatic memory management through a Continue reading

Secure and Scalable Kubernetes for Multi-Cluster Management

This story is becoming more and more common in the Kubernetes world. What starts as a manageable cluster or two can quickly balloon into a sprawling, multi-cluster architecture spanning public clouds, private data centers, or a bit of both. And with that growth comes a whole new set of headaches. How do you keep tabs on compliance across wildly different configurations? When a service goes down across multiple clusters, how do you pinpoint the cause amidst the chaos? And what about those hard-to-diagnose latency issues that seem to crop up between regions?

The truth is, achieving secure and scalable multi-cluster Kubernetes isn’t about throwing more tools at the problem. It’s about having the right tools and adopting the right best practices. This is where a solution like Calico Cluster Mesh shines, offering those essential capabilities for a seamless multi-cluster experience without the complexity or overhead that you expect with traditional service meshes.

The Multi-Cluster Challenge: When Complexity Takes Over

So, why are so many organizations finding themselves in this multi-cluster maze? Often, it’s driven by solid business reasons:

  • High Availability and Disaster Recovery: Spreading workloads across multiple regions or clusters means that if one goes down, your users shouldn’t notice.
  • Continue reading

AI Metrics with Grafana Cloud

The Grafana AI Metrics dashboard shown above tracks performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library (NCCL) operations for inter-GPU communications: AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter.

The metrics include:

  • Total Traffic Total traffic entering fabric
  • Operations Total RoCEv2 operations broken out by type
  • Core Link Traffic Histogram of load on fabric links
  • Edge Link Traffic Histogram of load on access ports
  • RDMA Operations Total RDMA operations
  • RDMA Bytes Average RDMA operation size
  • Credits Average number of credits in RoCEv2 acknowledgements
  • Period Detected period of compute / exchange activity on fabric (in this case just over 0.5 seconds)
  • Congestion Total ECN / CNP congestion messages
  • Errors Total ingress / egress errors
  • Discards Total ingress / egress discards
  • Drop Reasons Packet drop reasons
AI Metrics with Prometheus and Grafana describes how to stand up an analytics stack with Prometheus and Grafana to track performance metrics for an AI/ML GPU cluster. This article shows how to integrate with Prometheus and Grafana hosted in the cloud, Grafana Cloud, instead of running the services locally.

Note: Grafana Cloud has a free service tier that can be used to test this example.

Continue reading

Getting Started with the Pytest Plugin for Infrahub

Getting Started with the Pytest Plugin for Infrahub

We all write code, but how do we know the changes we make in the future won’t break something that used to work? That’s where testing becomes important.

The idea is to catch problems early, ideally before they reach production. In the Python world, one of the most common ways to do this is with a tool called pytest. It lets you write tests to check that your code behaves the way you expect and helps you catch issues before they become a bigger problem.

Originally published under - https://www.opsmill.com/pytest-plugin-infrahub/

When working with Infrahub, testing is just as important. You might want to make sure your GraphQL queries are valid, your Jinja2 templates render correctly, or your transformations behave as expected.

Infrahub simplifies this by offering a pytest plugin that doesn’t require Python code; you define tests using plain YAML. This makes testing more accessible to teams across roles and speeds up the feedback loop during development.

These kinds of unit tests aren’t just about convenience, they help establish a production-ready automation system. With automated checks built into your process, every change is validated consistently, reducing the chance of something breaking unexpectedly. That consistency builds trust when your Continue reading

Publishing Content as an Introvert

I got an interesting question from a reader. He listened to my podcast with Eric Chou and decided to try to learn in public:

Currently, I’m studying for the CCNP ENARSI exam, and would like to start posting my labs to LinkedIn, and perhaps even upload my lab topologies and configs to Git.

That’s a great idea. I would minimize the LinkedIn part1 and focus on Git:

UGreen NASync DXP2800 Review and First Impressions

UGreen NASync DXP2800 Review and First Impressions

TL;DR - For anyone who doesn’t want to go through the full post, here’s the short version. I bought the UGreen NASync DXP2800 (2 bay) from Amazon for £249 and paired it with two Seagate Ironwolf 8TB HDDs, around £180 each.

The unit comes with an Intel N100 CPU, 8GB of RAM (upgradeable to 16GB, but there’s only one RAM slot), and a 2.5Gb/s LAN port. It has a solid build, was easy to set up, and I actually like the UI. Sure, it lacks a lot of features compared to Synology or QNAP, but since I’m mainly using it for file storage, I’m happy with the purchase.

Setting up Proxmox Backup Server
In this post, we’ll go through the process of setting up Proxmox Backup Server and backing up all the VMs from my Proxmox server to this backup server. So, let’s get to it.
UGreen NASync DXP2800 Review and First Impressions

But Why UGreen NAS?

The short answer is, this is the best bang for the buck. For £249, I’m getting a 2-bay NAS with an N100 CPU, 8GB of RAM, a 2.5Gb/s LAN port, and two NVMe slots.

I’ve been wanting to buy a NAS for over Continue reading

Dell’s Advice To Enterprises: Buy AI, Don’t Try To Build It

Unsurprisingly, the main topic of conversation at the recent Dell Technologies World 2025 event in Las Vegas was AI, and a central theme that wove through many of the messages we heard there was that adopting the emerging technology is much easier now than it was even a year ago.

Dell’s Advice To Enterprises: Buy AI, Don’t Try To Build It was written by Jeffrey Burt at The Next Platform.

Finding End-to-End Paths: Topology and Endpoints

We know there are three main ways to move packets across a network. However, before we can start forwarding packets, someone has to populate the forwarding tables in the intermediate devices or build the sequence of nodes to traverse in source routing.

Usually, whoever is responsible for the contents of the forwarding tables must first discover the network topology. Let’s start there, using the following network diagram to illustrate the discussion.

Juniper vJunos-router in Containerlab

Juniper vJunos-router in Containerlab

If you follow me or read my blog, you probably know I'm a big advocate of Containerlab. I've been using it for over two years now and I absolutely love it. Why? Because it's open source, it has an amazing community behind it (thank you again, Roman), and labs are defined using simple YAML files that are easy to share and reuse.

So far, I've used Cisco IOL, Arista EOS, and Palo Alto VM in Containerlab. And finally, the time came to try Juniper. I decided to test the Juniper vJunos-router, which is a virtualized MX router. It's a single-VM version of vMX that doesn't require any feature licenses and is meant for lab or testing purposes. You can even download the image directly from Juniper's website without needing an account. Thank you, Juniper and Cisco, please take note. In this post, I'll show you how to run Juniper vJunos-router in Containerlab.

Prerequisites

This post assumes you're somewhat familiar with Containerlab and already have it installed. If you're new, feel free to check out my introductory blog below. Containerlab also has great documentation on how to use vJunos-router, so be sure to check that out as well.

1 2 3 3,788