Archive

Category Archives for "Networking"

HS102: IT’s Role In AI (Sponsored)

AI can impact an enterprise in several ways: making individuals more productive, making products and services more effective, and making it easier for customers and partners to do business. IT plays a critical role in enabling AI to have these impacts. On today’s sponsored Heavy Strategy, Cisco CIO Fletcher Previn explains how to locate AI use... Read more »

Synadia Attempts To Reclaim NATS Back From CNCF 

It has become almost commonplace to read about yet another company having regrets about open sourcing their flagship product and relicensing it under a semi-proprietary license. Yes, I’m looking at you, Hashicorp, MongoDB and Redis. Now, though, Synadia, the original creator and donor of the switching NATS’ open source Apache 2 license to the Business Source License (BSL). But, there’s a fly in the soup. You see, Synadia founder and CEO, Synadia and its predecessor company funded approximately 97% of the NATS server contributions.” Therefore, “For the NATS ecosystem to flourish, Synadia must also Continue reading

Jevons Paradox and Internet Centrality

William Stanley Jevons was one of the founders of neoclassical economics in the mid-nineteenth century. In the aftermath of the great railway mania of the mid 19th century he observed that the total consumption of coal had actually increased when technological progress improved the efficiency of steam engines. Jevons Paradox observes that that improvements in efficiency of resource utilisation can act as a positive incentive to increased resource consumption, exceeding the reductions that would be anticipated due to this greater efficiency. How does this relate to the Internet and the current issues relating to Internet Centrality?

Breaking APIs or Data Models Is a Cardinal Sin

Imagine you decide to believe the marketing story of your preferred networking vendor and start using the REST API to configure their devices. That probably involves some investment in automation or orchestration tools, as nobody in their right mind wants to use curl or Postman to configure network devices.

A few months later, after your toolchain has been thoroughly tested, you decide to upgrade the operating system on the network devices, and everything breaks. The root cause: the vendor changed their API or the data model between software releases.

ChatGPT on OSPF Area Ranges and Summary LSAs

I wanted to test a loop prevention when propagating summary LSA across areas scenario (more about that in another blog post) using the lab topology I developed for the When OSPF Becomes a Distance Vector Protocol article.

I started the lab with the FRRouting routers and configured OSPF area ranges. Astonishingly, I discovered that the more-specific prefixes from an area appear as summary routes in the backbone area even when the area range is configured. When I tried to reproduce the scenario a few days later, it turned out to be a timing quirk (I didn’t wait long enough), but my squirrelly mind was already investigating.

Boat Monitoring System

As I travel further north on the canals the mobile signal coverage is gradually getting worse so I decided to build a monitoring solutions to help with deciding where to moor. My initial idea was to use an Intel NUC or RaspberryPi with a 12v PSU, but then a friend told me about how he was monitoring his home lab using kubernetes on an old android phone, it sounded like the perfect solution.

How the April 28, 2025, power outage in Portugal and Spain impacted Internet traffic and connectivity

A massive power outage struck significant portions of Portugal and Spain at 10:34 UTC on April 28, grinding transportation to a halt, shutting retail businesses, and otherwise disrupting everyday activities and services. Parts of France were also reportedly impacted by the power outage. Portugal’s electrical grid operator blamed the outage on a "fault in the Spanish electricity grid”, and later stated that "due to extreme temperature variations in the interior of Spain, there were anomalous oscillations in the very high voltage lines (400 kilovolts), a phenomenon known as 'induced atmospheric vibration'" and that "These oscillations caused synchronisation failures between the electrical systems, leading to successive disturbances across the interconnected European network." However, the operator later denied these claims. 

The breadth of Cloudflare’s network and our customer base provides us with a unique perspective on Internet resilience, enabling us to observe the Internet impact of this power outage at both a local and national level, as well as at a network level, across traffic, network quality, and routing metrics.

Impacts in Portugal

Country level

In Portugal, Internet traffic dropped as the power grid failed, with traffic immediately dropping by half as compared to the Continue reading

Targeted by 20.5 million DDoS attacks, up 358% year-over-year: Cloudflare’s 2025 Q1 DDoS Threat Report

Welcome to the 21st edition of the Cloudflare DDoS Threat Report. Published quarterly, this report offers a comprehensive analysis of the evolving threat landscape of Distributed Denial of Service (DDoS) attacks based on data from the Cloudflare network. In this edition, we focus on the first quarter of 2025. To view previous reports, visit www.ddosreport.com.

While this report primarily focuses on 2025 Q1, it also includes late-breaking data from a hyper-volumetric DDoS campaign observed in April 2025, featuring some of the largest attacks ever publicly disclosed. In a historic surge of activity, we blocked the most intense packet rate attack on record, peaking at 4.8 billion packets per second (Bpps), 52% higher than the previous benchmark, and separately defended against a massive 6.5 terabits-per-second (Tbps) flood, matching the highest bandwidth attacks ever reported.

Key DDoS insights

  • In the first quarter of 2025, Cloudflare blocked 20.5 million DDoS attacks. That represents a 358% year-over-year (YoY) increase and a 198% quarter-over-quarter (QoQ) increase. 

  • Around one third of those, 6.6 million, targeted the Cloudflare network infrastructure directly, as part of an 18-day multi-vector attack campaign.

  • Furthermore, in the first quarter of 2025, Cloudflare blocked approximately Continue reading

Backend Network Topologies for AI Fabrics

Although there are best practices for AI Fabric backend networks, such as Data Center Quantized Congestion Control (DCQCN) for congestion avoidance, rail-optimized routed Clos fabrics, and Layer 2 Rail-Only topologies for small-scale implementations, each vendor offers its own validated design. This approach is beneficial because validated designs are thoroughly tested, and when you build your system based on the vendor’s recommendations, you receive full vendor support and avoid having to reinvent the wheel.

However, instead of focusing on any specific vendor’s design, this chapter explains general design principles for building a resilient, non-blocking, and lossless Ethernet backend network for AI workloads.

Before diving into backend network design, this chapter first provides a high-level overview of a GPU server based on NVIDIA H100 GPUs. The first section introduces a shared NIC architecture, where 8 GPUs share two NICs. The second section covers an architecture where each of the 8 GPUs has a dedicated NIC.


Shared NIC


Figure 13-1 illustrates a shared NIC approach. In this example setup, NVIDIA H100 GPUs 0–3 are connected to NVSwitch chips 1-1, 1-2, 1-3, and 1-4 on baseboard-1, while GPUs 4–7 are connected to NVSwitch chips 2-1, 2-2, 2-3, and 2-4 on baseboard-2. Each GPU connects Continue reading

GPT PROMPT AND IDEAS FOR NETWORK ENGINEERS Along with some function calling with gemini

LLM is a technology which needs no introduction.

LLMs + Networking = Awesome! 😎 Just dropped a playlist with the 9 key prompting bits that’ll help you organize and understand your network stuff way better. You know what to do!

One of the most important aspect is function calling where you can use the power of structured data and calling a specific tool to help you get the information in a right format. Let me know your thoughts.

Hedge 267: Can modularization solve people problems?

Solving technology problems often involves breaking a problem into multiple smaller problems, build interaction surfaces between the pieces, and glue the pieces back into a larger system. We also know every technology problem is actually a people problem–whether in the past, the present, or the future.

Given these two points, can we say something like: “If technology and people problems are interchangeable, we should be able to solve people problems the way we solve technology problems–via modularization?”

Join us as Tom, Eyvonne, and Russ discuss how this might–or might not–apply to the real world. The second trend we’re discussing on this episode of the Hedge is the apparent movement towards government telling data center operators to “bring your own power.”

download

Switching, Routing, and Bridging Terminology

After discussing networking layers and addressing, it’s time to focus on moving packets across a network. Vendors love to use ill-defined terms like switching instead of forwarding, routing, or bridging, so let’s start with the terminology.

Connecting all relevant devices to a single cable would indubitably simplify any networking stack, but unfortunately, we’re almost never that lucky. We need devices in the network (typically with multiple interfaces) that perform packet forwarding between end nodes.

Worth Reading: BGP Unnumbered in 2025

Gabriel sent me a pointer to a blog post by Rudolph Bott describing the details of BGP Unnumbered implementations on Nokia, Juniper, and Bird.

Even more interestingly, Rudolph points out the elephant I completely missed: RFC 8950 refers to RFC 2545, which requires a GUA IPv6 next hop in BGP updates (well, it uses the SHALL wording, which usually means “troubles ahead”). What do you do if you’re running EBGP on an interface with no global IPv6 addresses? As expected, vendors do different things, resulting in another fun interoperability exercise.

Finally, there’s RFC 7404 that advocates LLA-only infrastructure links, so we might find the answer there. Nope; it doesn’t even acknowledge the problem in the Caveats section.

For even more information, read the Unnumbered IPv4 Interfaces and BGP in Data Center Fabrics blog posts.

Recap: KubeCon + CloudNativeCon Europe 2025

When I got the assignment to attend KubeCon 1st of April I thought it was an April prank, but as the date got closer I realized—this is for real and I’ll be on the ground in London at the tenth anniversary of cloud native computing. I’ve seen a lot of tech events during my years in the industry while trying not to get replaced by AI and I have to say this one stands out!

Image source: CNCF YouTube Channel

Here is my recap of KubeCon + CloudNativeCon Europe 2025.

CalicoCon 2025

CalicoCon is an event that happens twice every year, as a co-located event during KubeCon NA and EU. It’s a free event that allows you to learn about Tigera’s vision for the future of networking and security in the cloud. There’s also an after-party to celebrate our community and people like you who are on this journey with us!

This year our main focus was on Calico v3.30, our upcoming release that will add a lot of anticipated features to Calico, unlocking things like observability, staged network policy, and gateway api. CalicoCon brought together cloud-native enthusiasts to explore the latest advancements in Calico and Kubernetes networking.

Continue reading

1 3 4 5 6 7 3,437