
Multiple Cloudflare services were unavailable for 37 minutes on October 30, 2023. This was due to the misconfiguration of a deployment tool used by Workers KV. This was a frustrating incident, made more difficult by Cloudflare’s reliance on our own suite of products. We are deeply sorry for the impact it had on customers. What follows is a discussion of what went wrong, how the incident was resolved, and the work we are undertaking to ensure it does not happen again.
Workers KV is our globally distributed key-value store. It is used by both customers and Cloudflare teams alike to manage configuration data, routing lookups, static asset bundles, authentication tokens, and other data that needs low-latency access.
During this incident, KV returned what it believed was a valid HTTP 401 (Unauthorized) status code instead of the requested key-value pair(s) due to a bug in a new deployment tool used by KV.
These errors manifested differently for each product depending on how KV is used by each service, with their impact detailed below.
A number of Cloudflare services depend on Workers KV for distributing configuration, routing information, static asset serving, and authentication state globally. These services instead received Continue reading
AI and machine learning are being more widely used in IT and elsewhere. Today's episode opens the AI magic box to better understand what's inside, including software and hardware. We discuss essentials such as training models and parameters, software components, GPUs, networking, and storage. We also discuss using cloud-based AI platforms vs. building your own in-house, and what to consider when assembling your own AI infrastructure.
The post D2C218: What’s Inside The AI Magic Box? appeared first on Packet Pushers.
Almost exactly a decade ago I wrote about a paper describing how IBGP migrations can cause forwarding loops and how one could reorder BGP reconfiguration steps to avoid them.
One of the paper’s authors was Laurent Vanbever who moved to ETH Zurich in the meantime where his group keeps producing great work, including the Chameleon tool (code on GitHub) that can tame transient loops while reconfiguring BGP. Definitely something worth looking at if you’re running a large BGP network.
Almost exactly a decade ago I wrote about a paper describing how IBGP migrations can cause forwarding loops and how one could reorder BGP reconfiguration steps to avoid them.
One of the paper’s authors was Laurent Vanbever who moved to ETH Zurich in the meantime where his group keeps producing great work, including the Chameleon tool (code on GitHub) that can tame transient loops while reconfiguring BGP. Definitely something worth looking at if you’re running a large BGP network.
Hash cracking is often paused or stopped for various reasons. Hashcat has a Pause button […]
The post Restoring Hashcat Cracking first appeared on Brezular's Blog.
Two Wi-Fi engineers share their experiences automating wireless workflows, including the role of Python & tools like Postman.
The post HW014: Exploring Wireless Automation From Python To APIs appeared first on Packet Pushers.
The Full Stack Journey is coming to an end. After five years and more than 80 episodes of deeply technical conversations about technologies, tools, and career journeys, this is the final episode of the series. I reflect on my time hosting the podcast, the challenges and pleasures of putting together a show, lessons and insights from all the conversations I've had, thoughts on the state of IT and technology, and what comes next.
The post The Final Journey Of Full Stack Journey appeared first on Packet Pushers.

When it comes to managing Internet properties, the difference between a small technical hiccup and major incident is often a matter of speed. Proactive alerting plays a crucial role, which is why we were excited when we released HTTP Error Rate notifications — giving administrators visibility into when end users are experiencing errors.
But what if there are issues that don't show up as errors, like a sudden drop in traffic, or a spike?
Today, we're excited to announce Traffic Anomalies notifications, available to enterprise customers. These notifications trigger when Cloudflare detects unexpected changes in traffic, giving another valuable perspective into the health of your systems.
Unexpected changes in traffic could be indicative of many things. If you run an ecommerce site and see a spike in traffic that could be great news — maybe customers are flocking to your sale, or you just had an ad run on a popular TV show. However, it could also mean that something is going wrong: maybe someone accidentally turned off a firewall rule, and now you’re seeing more malicious traffic. Either way, you might want to know that something has changed.
Similarly, a sudden drop in traffic could mean many things. Perhaps Continue reading
In the last post we used multicast to forward BUM frames. In this post, static ingress replication is used which removes the requirement for multicast in the underlay. Before setting this up, let’s do a comparison of multicast vs ingress replication. To make this interesting, let’s use a larger topology consisting of 2 Spines and 32 Leafs:

Assume that all Leafs have the same VNIs. In a scenario with multicast, Leaf-1 sends for example an ARP request encapsulated in multicast towards Spine-1. This is a single frame. Spine-1 then replicates this frame and sends it on all its links towards the Leafs:

This is very efficient as a single frame is sent by Leaf-1 and then 31 copies are sent on the other links to the Leafs. If the ARP request is 110 bytes, then a total of 32 x 110 = 3520 bytes have been consumed to send this ARP request. This is optimal forwarding from a resource consumption perspective and one of the strengths of multicast. Now let’s compare that to ingress replication. With ingress replication, Leaf-1 sends 31x copies of the frame (with different destination IP) towards Spine-1. Spine-1 then forwards those frames on each link towards Continue reading