Hedge 294: Resource Constrained Environments


 
The future of network design and architecture is–based on current trends–is going to be working with and around resource constraints. How would resource constraints impact the way we design and manage networks? Mike Bushong joins Tom, Eyvonne, and Russ to ponder network engineering in a resource constrained world.
 

 
download

Migrating from NGINX Ingress to Calico Ingress Gateway: A Step-by-Step Guide

From Ingress NGINX to Calico Ingress Gateway

In our previous post, we addressed the most common questions platform teams are asking as they prepare for the retirement of the NGINX Ingress Controller. With the March 2026 deadline fast approaching, this guide provides a hands-on, step-by-step walkthrough for migrating to the Kubernetes Gateway API using Calico Ingress Gateway. You will learn how to translate NGINX annotations into HTTPRoute rules, run both models side by side, and safely cut over live traffic.

A Brief History

The announced retirement of the NGINX Ingress Controller has created a forced migration path for the many teams that relied on it as the industry standard. While the Ingress API is not yet officially deprecated, the Kubernetes SIG Network has designated the Gateway API as its official successor. Legacy Ingress will no longer receive enhancements and exists primarily for backward compatibility.

Why the Industry is Standardizing on Gateway API

While the Ingress API served the community for years, it reached a functional ceiling. Calico Ingress Gateway implements the Gateway API to provide:

  • Role-Oriented Design: Clear separation between the infrastructure (managed by SREs) and routing logic (managed by Developers).
  • Native Expressiveness: Features like URL rewrites and header manipulation Continue reading

2025 Q4 DDoS threat report: A record-setting 31.4 Tbps attack caps a year of massive DDoS assaults

Welcome to the 24th edition of Cloudflare’s Quarterly DDoS Threat Report. In this report, Cloudforce One offers a comprehensive analysis of the evolving threat landscape of Distributed Denial of Service (DDoS) attacks based on data from the Cloudflare network. In this edition, we focus on the fourth quarter of 2025, as well as share overall 2025 data.

The fourth quarter of 2025 was characterized by an unprecedented bombardment launched by the Aisuru-Kimwolf botnet, dubbed “The Night Before Christmas" DDoS attack campaign. The campaign targeted Cloudflare customers as well as Cloudflare’s dashboard and infrastructure with hyper-volumetric HTTP DDoS attacks exceeding rates of 200 million requests per second (rps), just weeks after a record-breaking 31.4 Terabits per second (Tbps) attack.

Key insights

  1. DDoS attacks surged by 121% in 2025, reaching an average of 5,376 attacks automatically mitigated every hour.

  2. In the final quarter of 2025, Hong Kong jumped 12 places, making it the second most DDoS’d place on earth. The United Kingdom also leapt by an astonishing 36 places, making it the sixth most-attacked place.

  3. Infected Android TVs — part of the Aisuru-Kimwolf botnet — bombarded Cloudflare’s network with hyper-volumetric HTTP DDoS attacks, while Telcos emerged as the most-attacked industry.

Ultra Ethernet: Receiver Credit-based Congestion Control (RCCC)

 Introduction

Receiver Credit-Based Congestion Control (RCCC) is a cornerstone of the Ultra Ethernet transport architecture, specifically designed to eliminate incast congestion. Incast occurs at the last-hop switch when the aggregate data rate from multiple senders exceeds the egress interface capacity of the target’s link. This mismatch leads to rapid buffer exhaustion on the outgoing interface, resulting in packet drops and severe performance degradation.


The RCCC Mechanism

Figure 8-1 illustrates the operational flow of the RCCC algorithm. In a standard scenario without credit limits, source Rank 0 and Rank 1 might attempt to transmit at their full 100G line rates simultaneously. If the backbone fabric consists of 400G inter-switch links, the core utilization remains a comfortable 50% (200G total traffic). However, because the target host link is only 100G, the last-hop switch (Leaf 1B-1) becomes an immediate bottleneck. The switch is forced to queue packets that cannot be forwarded at the 100G egress rate, eventually triggering incast congestion and buffer overflows.

While "incast" occurs at the egress interface and can resemble head-of-line blocking, it is fundamentally a "fan-in" problem where multiple sources converge on a single receiver. Under RCCC, standard Explicit Congestion Notification (ECN) on the last-hop switch's egress interface is Continue reading

On MPLS Forwarding Performance Myths

Whenever I claim that the initial use case for MPLS was improved forwarding performance (using the RFC that matches the IETF MPLS BoF slides as supporting evidence), someone inevitably comes up with a source claiming something along these lines:

The idea of speeding up the lookup operation on an IP datagram turned out to have little practical impact.

That might be true1, although I do remember how hard it was for Cisco to build the first IP forwarding hardware in the AGS+ CBUS controller. Switching labels would be much faster (or at least cheaper), but the time it takes to do a forwarding table lookup was never the main consideration. It was all about the aggregate forwarding performance of core devices.

Anyhow, Duty Calls. It’s time for another archeology dig. Unfortunately, most of the primary sources irrecoverably went to /dev/null, and personal memories are never reliable; comments are most welcome.

AMD Finally Makes More Money On GPUs Than CPUs In A Quarter

Pent up demand for MI308 GPUs in China, which AMD has been trying to get a license to sell since early last year, were approved so that $360 million in Instinct GPU sales that were not officially part of the pipeline made their way onto the AMD books in Q4 2025.

AMD Finally Makes More Money On GPUs Than CPUs In A Quarter was written by Timothy Prickett Morgan at The Next Platform.

Dassault And Nvidia Bring Industrial World Models To Physical AI

During his more than two decades with Nvidia, Rev Lebaredian has had a ringside seat to the show that has been the evolution of modern AI, from the introduction of the AlexNet  deep convolutional neural network that made waves by drastically lowering the error rate at the 2012 ImageNet challenge to the introduction of generative AI and now agentic AI, where systems can create AI assistance to help with knowledge work.

Dassault And Nvidia Bring Industrial World Models To Physical AI was written by Jeffrey Burt at The Next Platform.

OMG, After a Decade, VXLAN Is Still Insecure

In 2017 (over eight years ago), I was making fun of the fact that “VXLAN is insecure” was news to some people. Obviously, the message needed to be repeated, as the same author gave a very similar presentation two years later at a security conference.

Unfortunately, it seems that everything old is new again (see also RFC 1925 rules 4 and 11), as proved by a “Using GRE and VXLAN for Fun and Profit” (my summary) presentation at DEFCON 33. Even if you knew that unencrypted tunnels are insecure (duh!) for decades, you might still want to read the summary of the talk (published on APNIC blog) and view the slides.

Calico Ingress Gateway: Key FAQs Before Migrating from NGINX Ingress Controller

What Platform Teams Need to Know Before Moving to Gateway API

We recently sat down with representatives from 42 companies to discuss a pivotal moment in Kubernetes networking: the NGINX Ingress retirement.

With the March 2026 retirement of the NGINX Ingress Controller fast approaching, platform teams are now facing a hard deadline to modernize their ingress strategy. This urgency was reflected in our recent workshop, “Switching from NGINX Ingress Controller to Calico Ingress Gateway” which saw an overwhelming turnout, with engineers representing a cross-section of the industry, from financial services to high-growth tech startups.

During the session, the Tigera team highlighted a hard truth for platform teams: the original Ingress API was designed for a simpler era. Today, teams are struggling to manage production traffic through “annotation sprawl”—a web of brittle, implementation-specific hacks that make multi-tenancy and consistent security an operational nightmare.

The move to the Kubernetes Gateway API isn’t just a mandatory update; it’s a graduation to a role-oriented, expressive networking model. We’ve previously explored this shift in our blogs on Understanding the NGINX Retirement and Why the Ingress NGINX Controller is Dead.

Bridging the Role Gap: Transitioning from the flat, annotation-heavy Ingress model to the role-oriented Continue reading

TACC Explores Mixed Precision And FP64 Emulation For HPC With Horizon

If you want to test out an idea in HPC simulation and modeling and see how it affects a broad array of scientific applications, there is probably not a better place than the Texas Advanced Computing Center at the University of Texas.

TACC Explores Mixed Precision And FP64 Emulation For HPC With Horizon was written by Timothy Prickett Morgan at The Next Platform.

Improve global upload performance with R2 Local Uploads

Today, we are launching Local Uploads for R2 in open beta. With Local Uploads enabled, object data is automatically written to a storage location close to the client first, then asynchronously copied to where the bucket lives. The data is immediately accessible and stays strongly consistent. Uploads get faster, and data feels global.

For many applications, performance needs to be global. Users uploading media content from different regions, for example, or devices sending logs and telemetry from all around the world. But your data has to live somewhere, and that means uploads from far away have to travel the full distance to reach your bucket.

R2 is object storage built on Cloudflare's global network. Out of the box, it automatically caches object data globally for fast reads anywhere — all while retaining strong consistency and zero egress fees. This happens behind the scenes whether you're using the S3 API, Workers Bindings, or plain HTTP. And now with Local Uploads, both reads and writes can be fast from anywhere in the world.

Try it yourself in this demo to see the benefits of Local Uploads.

Ready to try it? Enable Local Uploads in the Cloudflare Dashboard under your bucket's settings, or Continue reading

Interface MAC Address in IOS Layer-2 Images

Here’s another “You can’t make this up, but it sounds too crazy to be true” story: Cisco IOS layer-2 images change the interface MAC address when you change the interface switchport status.

Let me start with a bit of background:

  • IOL Layer 2 image starts with interfaces enabled and in bridged (switchport) mode (details)
  • netlab has to run a normalize script (applicable to IOLL2, IOSv L2, and Arista EOS) before configuring anything else to ensure all interfaces are shut down.
  • The IOLL2 normalize Jinja template had a bug – when setting the interface MAC address, it checked l.mac_address instead of intf.mac_address. Nevertheless, everything worked because the MAC addresses were also set during the initial device configuration.

NB560: Microsoft Doubles Down on Custom AI Chip; CrowdStrike Brandishes Big Bucks for Browser Security

Take a Network Break! We’ve got Red Alerts for HPE Juniper Session Smart Routers and SolarWinds. In this week’s news, Microsoft debuts its second-generation AI inferencing chip, Mplify rolls out a new Carrier Ethernet certification for supporting AI workloads, and AWS upgrades its network firewall to spot GenAI application traffic and filter Web categories. Google... Read more »

S3 is the new network: Rethinking data architecture for the cloud era

For decades, distributed databases have been built around the assumption that storage will live close to compute. The farther data travels over the network, the reasoning goes, the greater the potential for delay. Local RAID (redundant array of independent disks) arrays, network-attached storage (NAS), and cluster file systems keep data close, making it quick and easy to access. But in a distributed system, keeping the entire data store close to compute makes scaling slow, cumbersome, and expensive. Each time a node or cluster is replicated, its associated data must be replicated as well. It isn’t ideal, but until recently, there wasn’t any reasonable alternative. Databases had to scale. Service-level agreements (SLAs) had to be met. Wide-area networks weren’t reliable enough to support high-performance databases at scale. Database designers accordingly spent a great deal of energy solving problems related to coordination, consistency, and replication logic. But imagine things were different. What if they didn’t have to worry about the network, where their data lived, or how to get it from Point A to Point B? How would they design a database then? That’s the intriguing question raised by the advent of cloud object storage services like AWS S3, Google Cloud Continue reading

Fast FRR Container Configuration

After creating the infrastructure that generates the device configuration files within netlab (not in an Ansible playbook), it was time to try to apply it to something else, not just Linux containers. FRR containers were the obvious next target.

netlab uses two different mechanisms to configure FRR containers:

  • Data-plane features are configured with bash scripts using ip commands and friends.
  • Control-plane features are configured with FRR’s vtysh

I wanted to replace both with Linux scripts that could be started with the docker exec command.

1 2 3 3,841