Imagine you decide to believe the marketing story of your preferred networking vendor and start using the REST API to configure their devices. That probably involves some investment in automation or orchestration tools, as nobody in their right mind wants to use curl or Postman to configure network devices.
A few months later, after your toolchain has been thoroughly tested, you decide to upgrade the operating system on the network devices, and everything breaks. The root cause: the vendor changed their API or the data model between software releases.
It is a tumultuous time for any agency in the US government or any company or organization that depends on the US government for a sizable portion of its funding or revenue. …
If NSF Snoozes, Then TACC’s “Horizon” Supercomputer Loses was written by Timothy Prickett Morgan at The Next Platform.
I wanted to test a loop prevention when propagating summary LSA across areas scenario (more about that in another blog post) using the lab topology I developed for the When OSPF Becomes a Distance Vector Protocol article.
I started the lab with the FRRouting routers and configured OSPF area ranges. Astonishingly, I discovered that the more-specific prefixes from an area appear as summary routes in the backbone area even when the area range is configured. When I tried to reproduce the scenario a few days later, it turned out to be a timing quirk (I didn’t wait long enough), but my squirrelly mind was already investigating.
As I travel further north on the canals the mobile signal coverage is gradually getting worse so I decided to build a monitoring solutions to help with deciding where to moor. My initial idea was to use an Intel NUC or RaspberryPi with a 12v PSU, but then a friend told me about how he was monitoring his home lab using kubernetes on an old android phone, it sounded like the perfect solution.
Welcome to the 21st edition of the Cloudflare DDoS Threat Report. Published quarterly, this report offers a comprehensive analysis of the evolving threat landscape of Distributed Denial of Service (DDoS) attacks based on data from the Cloudflare network. In this edition, we focus on the first quarter of 2025. To view previous reports, visit www.ddosreport.com.
While this report primarily focuses on 2025 Q1, it also includes late-breaking data from a hyper-volumetric DDoS campaign observed in April 2025, featuring some of the largest attacks ever publicly disclosed. In a historic surge of activity, we blocked the most intense packet rate attack on record, peaking at 4.8 billion packets per second (Bpps), 52% higher than the previous benchmark, and separately defended against a massive 6.5 terabits-per-second (Tbps) flood, matching the highest bandwidth attacks ever reported.
In the first quarter of 2025, Cloudflare blocked 20.5 million DDoS attacks. That represents a 358% year-over-year (YoY) increase and a 198% quarter-over-quarter (QoQ) increase.
Around one third of those, 6.6 million, targeted the Cloudflare network infrastructure directly, as part of an 18-day multi-vector attack campaign.
Furthermore, in the first quarter of 2025, Cloudflare blocked approximately Continue reading
A massive power outage struck significant portions of Portugal and Spain at 10:34 UTC on April 28, grinding transportation to a halt, shutting retail businesses, and otherwise disrupting everyday activities and services. Parts of France were also reportedly impacted by the power outage. Portugal’s electrical grid operator blamed the outage on a "fault in the Spanish electricity grid”, and later stated that "due to extreme temperature variations in the interior of Spain, there were anomalous oscillations in the very high voltage lines (400 kilovolts), a phenomenon known as 'induced atmospheric vibration'" and that "These oscillations caused synchronisation failures between the electrical systems, leading to successive disturbances across the interconnected European network." However, the operator later denied these claims.
The breadth of Cloudflare’s network and our customer base provides us with a unique perspective on Internet resilience, enabling us to observe the Internet impact of this power outage at both a local and national level, as well as at a network level, across traffic, network quality, and routing metrics.
In Portugal, Internet traffic dropped as the power grid failed, with traffic immediately dropping by half as compared to the Continue reading
Although there are best practices for AI Fabric backend networks, such as Data Center Quantized Congestion Control (DCQCN) for congestion avoidance, rail-optimized routed Clos fabrics, and Layer 2 Rail-Only topologies for small-scale implementations, each vendor offers its own validated design. This approach is beneficial because validated designs are thoroughly tested, and when you build your system based on the vendor’s recommendations, you receive full vendor support and avoid having to reinvent the wheel.
However, instead of focusing on any specific vendor’s design, this chapter explains general design principles for building a resilient, non-blocking, and lossless Ethernet backend network for AI workloads.
Before diving into backend network design, this chapter first provides a high-level overview of a GPU server based on NVIDIA H100 GPUs. The first section introduces a shared NIC architecture, where 8 GPUs share two NICs. The second section covers an architecture where each of the 8 GPUs has a dedicated NIC.
Figure 13-1 illustrates a shared NIC approach. In this example setup, NVIDIA H100 GPUs 0–3 are connected to NVSwitch chips 1-1, 1-2, 1-3, and 1-4 on baseboard-1, while GPUs 4–7 are connected to NVSwitch chips 2-1, 2-2, 2-3, and 2-4 on baseboard-2. Each GPU connects Continue reading
LLM is a technology which needs no introduction.
LLMs + Networking = Awesome! Just dropped a playlist with the 9 key prompting bits that’ll help you organize and understand your network stuff way better. You know what to do!
One of the most important aspect is function calling where you can use the power of structured data and calling a specific tool to help you get the information in a right format. Let me know your thoughts.
Intel’s new chief executive officer, Lip-Bu Tan, has his work cut out for him, just like his predecessor, Pat Gelsinger, did several years ago. …
“No Quick Fixes” As Intel Losses And Restructurings Continue was written by Timothy Prickett Morgan at The Next Platform.
Solving technology problems often involves breaking a problem into multiple smaller problems, build interaction surfaces between the pieces, and glue the pieces back into a larger system. We also know every technology problem is actually a people problem–whether in the past, the present, or the future.
Given these two points, can we say something like: “If technology and people problems are interchangeable, we should be able to solve people problems the way we solve technology problems–via modularization?”
Join us as Tom, Eyvonne, and Russ discuss how this might–or might not–apply to the real world. The second trend we’re discussing on this episode of the Hedge is the apparent movement towards government telling data center operators to “bring your own power.”
After discussing networking layers and addressing, it’s time to focus on moving packets across a network. Vendors love to use ill-defined terms like switching instead of forwarding, routing, or bridging, so let’s start with the terminology.
Connecting all relevant devices to a single cable would indubitably simplify any networking stack, but unfortunately, we’re almost never that lucky. We need devices in the network (typically with multiple interfaces) that perform packet forwarding between end nodes.