
Multiple Cloudflare services were unavailable for 37 minutes on October 30, 2023. This was due to the misconfiguration of a deployment tool used by Workers KV. This was a frustrating incident, made more difficult by Cloudflare’s reliance on our own suite of products. We are deeply sorry for the impact it had on customers. What follows is a discussion of what went wrong, how the incident was resolved, and the work we are undertaking to ensure it does not happen again.
Workers KV is our globally distributed key-value store. It is used by both customers and Cloudflare teams alike to manage configuration data, routing lookups, static asset bundles, authentication tokens, and other data that needs low-latency access.
During this incident, KV returned what it believed was a valid HTTP 401 (Unauthorized) status code instead of the requested key-value pair(s) due to a bug in a new deployment tool used by KV.
These errors manifested differently for each product depending on how KV is used by each service, with their impact detailed below.
A number of Cloudflare services depend on Workers KV for distributing configuration, routing information, static asset serving, and authentication state globally. These services instead received Continue reading
Timing is a funny thing. The summer of 2006 when AMD bought GPU maker ATI Technologies for $5.6 billion and took on both Intel in CPUs and Nvidia in GPUs was the same summer when researchers first started figuring out how to offload single-precision floating point math operations from CPUs to Nvidia GPUs to try to accelerate HPC simulation and modeling workloads. …
The post AMD’s Instinct GPU Business Is Coiled To Spring first appeared on The Next Platform.
AMD’s Instinct GPU Business Is Coiled To Spring was written by Timothy Prickett Morgan at The Next Platform.
AI and machine learning are being more widely used in IT and elsewhere. Today's episode opens the AI magic box to better understand what's inside, including software and hardware. We discuss essentials such as training models and parameters, software components, GPUs, networking, and storage. We also discuss using cloud-based AI platforms vs. building your own in-house, and what to consider when assembling your own AI infrastructure.
The post D2C218: What’s Inside The AI Magic Box? appeared first on Packet Pushers.
Almost exactly a decade ago I wrote about a paper describing how IBGP migrations can cause forwarding loops and how one could reorder BGP reconfiguration steps to avoid them.
One of the paper’s authors was Laurent Vanbever who moved to ETH Zurich in the meantime where his group keeps producing great work, including the Chameleon tool (code on GitHub) that can tame transient loops while reconfiguring BGP. Definitely something worth looking at if you’re running a large BGP network.
Almost exactly a decade ago I wrote about a paper describing how IBGP migrations can cause forwarding loops and how one could reorder BGP reconfiguration steps to avoid them.
One of the paper’s authors was Laurent Vanbever who moved to ETH Zurich in the meantime where his group keeps producing great work, including the Chameleon tool (code on GitHub) that can tame transient loops while reconfiguring BGP. Definitely something worth looking at if you’re running a large BGP network.
Hash cracking is often paused or stopped for various reasons. Hashcat has a Pause button […]
The post Restoring Hashcat Cracking first appeared on Brezular's Blog.
Because they are in the front of the line for acquiring Nvidia datacenter GPUs, the hyperscalers and cloud builders are going to be the ones who benefit mightily from shortages of matrix math engines that can train AI models and run inference against them. …
The post Amazon Gears Up To Profit Mightily From The Generative AI Boom first appeared on The Next Platform.
Amazon Gears Up To Profit Mightily From The Generative AI Boom was written by Timothy Prickett Morgan at The Next Platform.
Two Wi-Fi engineers share their experiences automating wireless workflows, including the role of Python & tools like Postman.
The post HW014: Exploring Wireless Automation From Python To APIs appeared first on Packet Pushers.
The Full Stack Journey is coming to an end. After five years and more than 80 episodes of deeply technical conversations about technologies, tools, and career journeys, this is the final episode of the series. I reflect on my time hosting the podcast, the challenges and pleasures of putting together a show, lessons and insights from all the conversations I've had, thoughts on the state of IT and technology, and what comes next.
The post The Final Journey Of Full Stack Journey appeared first on Packet Pushers.

When it comes to managing Internet properties, the difference between a small technical hiccup and major incident is often a matter of speed. Proactive alerting plays a crucial role, which is why we were excited when we released HTTP Error Rate notifications — giving administrators visibility into when end users are experiencing errors.
But what if there are issues that don't show up as errors, like a sudden drop in traffic, or a spike?
Today, we're excited to announce Traffic Anomalies notifications, available to enterprise customers. These notifications trigger when Cloudflare detects unexpected changes in traffic, giving another valuable perspective into the health of your systems.
Unexpected changes in traffic could be indicative of many things. If you run an ecommerce site and see a spike in traffic that could be great news — maybe customers are flocking to your sale, or you just had an ad run on a popular TV show. However, it could also mean that something is going wrong: maybe someone accidentally turned off a firewall rule, and now you’re seeing more malicious traffic. Either way, you might want to know that something has changed.
Similarly, a sudden drop in traffic could mean many things. Perhaps Continue reading