NetworkingNexus.net -

Worth Reading 091925

“Classic” TCP uses an extremely simple loss-based congestion detection algorithm that is intended to save networks from collapsing under extreme overload.

The endgame is a society where corporate algorithms make decisions about employment, education, and social interaction with no accountability.

The rise of Agentic AI, the emergence and adoption of AI agents and agent-to-agent networking to autonomously perform tasks on behalf of humans, has introduced unique challenges for existing security products.

In the landscape of organizational management, a distinction exists between teams that (a.) efficiently deliver a high-quality service or product, and (b.) those that innovate and develop their thought leadership in an area of emerging technology.

Broadcom CEO Hock Tan delivered a rather defiant keynote to open the VMware Explore conference in Las Vegas recently, telling the audience they are better off using the latest version of VMware Cloud Foundation (VCF) on-premises than hyperscale cloud service providers.

The public is told that AI systems are super smart and have the world’s info at their electronic beck and call. At the same time, it is humans and human organizations who claim professional expertise and so deliver their “truth” via media and Internet.

While Eutelsat’s OneWeb operates the second-largest Continue reading

Hedge 281: Blockchain

What is the relationship between blockchain technologies and network engineering? Is blockchain “just another application,” or are there implications for naming, performance, and connectivity? Austin Federa joins Tom and Russ to discuss the intersection of blockchain and networks.

download

Technology Short Take 188

Welcome to Technology Short Take #188! I’m back once again with a small collection of articles and links related to a variety of data center-related technologies. I hope you find something useful!

Networking

Scott Hogg discusses using dual-stack network configurations on your DNS name servers as part of an overall IPv6 deployment plan.
Leon Adato explains why knowing what’s under the hood helps when it comes to networking telemetry.
Via Ivan Pepelnjak, I found Suresh Vina’s article on using netlab to build network labs.

Security

The use of malicious software packages continues to be a vector for attacks; here’s the latest discovery of malicious Go and npm packages.
James Kettle explains why HTTP/1.1 must die.
Gary Marcus and Nathan Hamiel discuss why the combination of LLMs plus coding agents has the potential to result in a security nightmare.
Weaponized RAR files are delivering a backdoor to Linux systems using only a filename—more details in the linked article.
WhatsApp patches a zero-click exploit targeting iOS and macOS devices. I don’t use WhatsApp, but I know lots of folks that do (it’s especially popular among my international friends). If you’re a WhatsApp user, be sure to update.
More software supply chain Continue reading

You don’t need quantum hardware for post-quantum security

Organizations have finite resources available to combat threats, both by the adversaries of today and those in the not-so-distant future that are armed with quantum computers. In this post, we provide guidance on what to prioritize to best prepare for the future, when quantum computers become powerful enough to break the conventional cryptography that underpins the security of modern computing systems. We describe how post-quantum cryptography (PQC) can be deployed on your existing hardware to protect from threats posed by quantum computing, and explain why quantum key distribution (QKD) and quantum random number generation (QRNG) are neither necessary nor sufficient for security in the quantum age.

Are you quantum ready?

“Quantum” is becoming one of the most heavily used buzzwords in the tech industry. What does it actually mean, and why should you care?

At its core, “quantum” refers to technologies that harness principles of quantum mechanics to perform tasks that are not feasible with classical computers. Quantum computers have exciting potential to unlock advancements in materials science and medicine, but also pose a threat to computer security systems. The term Q-day refers to the day that adversaries possess quantum computers that are large and stable enough to Continue reading

Connect and secure any private or public app by hostname, not IP — free for everyone in Cloudflare One

Connecting to an application should be as simple as knowing its name. Yet, many security models still force us to rely on brittle, ever-changing IP addresses. And we heard from many of you that managing those ever-changing IP lists was a constant struggle.

Today, we’re taking a major step toward making that a relic of the past.

We're excited to announce that you can now route traffic to Cloudflare Tunnel based on a hostname or a domain. This allows you to use Cloudflare Tunnel to build simple zero-trust and egress policies for your private and public web applications without ever needing to know their underlying IP. This is one more step on our mission to strengthen platform-wide support for hostname- and domain-based policies in the Cloudflare One SASE platform, simplifying complexity and improving security for our customers and end users.

Grant access to applications, not networks

In August 2020, the National Institute of Standards (NIST) published Special Publication 800-207, encouraging organizations to abandon the "castle-and-moat" model of security (where trust is established on the basis of network location) and move to a Zero Trust model (where we “verify anything and everything attempting to establish access").

Nvidia And Hyperscaler Friends Do Massive AI Deals In The UK

As adoption of generative AI, agentic AI, and all other AIs expands around the globe, a focus in the industry can enable nations to more closely use its own infrastructure, workforces, and data to control, create, and deploy AI models. …

Nvidia And Hyperscaler Friends Do Massive AI Deals In The UK was written by Jeffrey Burt at The Next Platform.

Arista EOS Hates a Routing Instance with No Interfaces

I always ask engineers reporting a netlab bug to provide a minimal lab topology that would reproduce the error, sometimes resulting in “interesting” side effects. For example, I was trying to debug a BGP-related Arista EOS issue using a netlab topology similar to this one:

defaults.device: eos
module: [ bgp ]
nodes:
  a: { bgp.as: 65000 }
  b: { bgp.as: 65001 }

Imagine my astonishment when the two switches failed to configure BGP. Here’s the error message I got when running the netlab’s deploy device configurations Ansible playbook:

The RUM Diaries: enabling Web Analytics by default

Measuring and improving performance on the Internet can be a daunting task because it spans multiple layers: from the user’s device and browser, to DNS lookups and the network routes, to edge configurations and origin server location. Each layer introduces its own variability such as last-mile bandwidth constraints, third-party scripts, or limited CPU resources, that are often invisible unless you have robust observability tooling in place. Even if you gather data from most of these Internet hops, performance engineers still need to correlate different metrics like front-end events, network processing times, and server-side logs in order to pinpoint where and why elusive “latency” occurs to understand how to fix it.

We want to solve this problem by providing a powerful, in-depth monitoring solution that helps you debug and optimize applications, so you can understand and trace performance issues across the Internet, end to end.

That’s why we’re excited to announce the start of a major upgrade to Cloudflare’s performance analytics suite: Web Analytics as part of our real user monitoring (RUM) tools will soon be combined with network-level insights to help you pinpoint performance issues anywhere on a packet’s journey — from a visitor’s browser, through Cloudflare’s network, to your Continue reading

Google Shows Off Its Inference Scale And Prowess

If the hyperscalers are masters of anything, it is driving scale up and driving costs down so that a new type of information technology can be cheap enough so it can be widely deployed. …

Google Shows Off Its Inference Scale And Prowess was written by Timothy Prickett Morgan at The Next Platform.

SwiNOG 40: When a Routing Control Functions Is Too Fresh

During integration testing, I find unexpected quirks in network devices way too often. However, that’s infinitely better than experiencing them in production (even after thoroughly testing stuff) while discovering that your peers don’t care about routing security, RPKI, and similar useless stuff.

For example, what happens if you define a new Routing Control Function (RFC) on Arista EOS and apply it to BGP routing updates in the same configuration session? You’ll find out in the Sorry We Messed Up (video) presentation Stefan Funke had at SwiNOG 40 (note: the bug has been fixed in the meantime).

Calico Whisker vs. Traditional Observability: Why Context Matters in Kubernetes Networking

Are you tired of digging through cryptic logs to understand your Kubernetes network? In today’s fast-paced cloud environments, clear, real-time visibility isn’t a luxury, it’s a necessity. Traditional logging and metrics often fall short, leaving you without the context needed to troubleshoot effectively.

That’s precisely what Calico Whisker’s recent launch (with Calico v3.30) aims to solve. This tool provides clarity where logs alone fall short. In the sections below, you’ll get a practical overview of how it works and how it fits into modern Kubernetes networking and security workflows.

If you’re relying on logs for network observability, you’re not alone. While this approach can provide some insights, it’s often a manual, resource-intensive process that puts significant load on your distributed systems. It’s simply not a cloud-native solution for real-time insights.

So are we doomed? No. Calico Whisker transforms network observability from a chore into a superpower.

What is Calico Whisker?

Calico Whisker is a free, lightweight, Kubernetes-native observability user interface (UI) created by Tigera and introduced with Calico Open Source v3.30. It’s designed to give you a simple yet powerful window into your cluster’s network traffic, helping you understand network flows and evaluate policy behavior in real-time.

In Continue reading

Nvidia Takes The Commanding Lead In Datacenter Ethernet Switching

Well, that didn’t take long. In April 2020, Nvidia completed its $6.9 billion acquisition of Mellanox Technologies for its InfiniBand and Ethernet switching, and a little more than five years and a GenAI boom later Nvidia has been crowned the leading revenue generator for Ethernet switching in the datacenter by IDC. …

Nvidia Takes The Commanding Lead In Datacenter Ethernet Switching was written by Timothy Prickett Morgan at The Next Platform.

Updated: netlab Network Topology Graphs

netlab release 25.09 introduced numerous graphing enhancements and a new graph type (IS-IS graphs), so I decided to write a series of blog posts explaining how you can generate graphs from netlab lab topologies.

I wrote an intro to netlab topology graphs years ago, and as expected, it was hopelessly outdated, so I started the project with a complete overhaul of that article.

Field Of GPUs

“If you build it, they will come,” as we all learned from watching Field of Dreams two and a half decades ago. …

Field Of GPUs was written by Timothy Prickett Morgan at The Next Platform.

Integrating CrowdStrike Falcon Fusion SOAR with Cloudflare’s SASE platform

The challenge of manual response

Security teams know all too well the grind of manual investigations and remediation. With the mass adoption of AI and increasingly automated attacks, defenders cannot afford to rely on overly manual, low priority, and complex workflows.

Heavily burdensome manual response introduces delays as analysts bounce between consoles and high alert volumes, contributing to alert fatigue. Even worse, it prevents security teams from dedicating time to high-priority threats and strategic, innovative work. To keep pace, SOCs need automated responses that contain and remediate common threats at machine speed before they become business-impacting incidents.

Expanding our capabilities with CrowdStrike Falcon® Fusion’ SOAR

That’s why today, we’re excited to announce a new integration between the Cloudflare One platform and CrowdStrike's Falcon® Fusion SOAR.

As part of our ongoing partnership with CrowdStrike, this integration introduces two out-of-the-box integrations for Zero Trust and Email Security designed for organizations already leveraging CrowdStrike Falcon® Insight XDR or CrowdStrike Falcon® Next-Gen SIEM.

This allows SOC teams to gain powerful new capabilities to stop phishing, malware, and suspicious behavior faster, with less manual effort.

Out-of-the-box integrations

Although teams can always create custom automations, we’ve made it simple to get started with two Continue reading

Condor Technology To Fly “Cuzco” RISC-V CPU Into The Datacenter

Once a hyperscaler or a cloud builder gets big enough, it can afford to design custom compute engines that more precisely match its needs. …

Condor Technology To Fly “Cuzco” RISC-V CPU Into The Datacenter was written by Jeffrey Burt at The Next Platform.

[FATAL] Ansible Release 12.0 Breaks netlab Jinja2 Templates

On September 9th, the ansible release 12.0 appeared on PyPi. It requires ansible-core release 2.19, which includes breaking changes to Jinja2 templating. netlab Jinja2 templates rely on a few Ansible Jinja2 filters; netlab thus imports and uses those filters, and it looks like those imports pulled in the breaking changes that consequently broke the netlab containerlab configuration file template (details).

netlab did not check the Ansible core version (we never had a similar problem in the past), and the installation scripts did not pin the Ansible version (feel free to blame me for this one), which means that any new netlab installation created after September 9th crashed miserably on the simplest lab topologies.

This is the workaround we implemented in netlab release 25.09-post1 (released earlier today):

Oracle Cloud Can Be As Big As AWS This Decade

Wouldn’t it be funny if Larry Ellison, who has become the elder statesman of the datacenter, had the last laugh on the cloud builders and model builders by beating them at their own game? …

Oracle Cloud Can Be As Big As AWS This Decade was written by Timothy Prickett Morgan at The Next Platform.

BGP Multi-Homed with Two ISPs and Two Routers

If you are a Network Engineer working for an Enterprise, you may not work with BGP as often as someone at an ISP does. In most cases, you will only run BGP at the edge of your network to peer with your ISP and leave it at that. There are many ways to connect to an ISP. If you are a small company without your own IP address space or autonomous system, you typically rely on the ISP to allocate a portion of their IP space for you, and you use a static route pointing to them (single-homed). For redundancy, you might connect to two ISPs or take two diverse links from the same ISP (dual-homed/multi-homed). In many of those setups, you may not run BGP yourself, but it depends on the design.

In this post, we will look at a scenario where you already have your own IP address space and an AS number, and you connect to two different ISPs. You will advertise your IP space to the Internet via both ISPs and, at the same time, receive the full Internet routing table from both ISPs.

If you are completely new to BGP, I recommend checking out Continue reading

A deep dive into Cloudflare’s September 12, 2025 dashboard and API outage

What Happened

We had an outage in our Tenant Service API which led to a broad outage of many of our APIs and the Cloudflare Dashboard.

The incident’s impact stemmed from several issues, but the immediate trigger was a bug in the dashboard. This bug caused repeated, unnecessary calls to the Tenant Service API. The API calls were managed by a React useEffect hook, but we mistakenly included a problematic object in its dependency array. Because this object was recreated on every state or prop change, React treated it as “always new,” causing the useEffect to re-run each time. As a result, the API call executed many times during a single dashboard render instead of just once. This behavior coincided with a service update to the Tenant Service API, compounding instability and ultimately overwhelming the service, which then failed to recover.

When the Tenant Service became overloaded, it had an impact on other APIs and the dashboard because Tenant Service is part of our API request authorization logic. Without Tenant Service, API request authorization can not be evaluated. When authorization evaluation fails, API requests return 5xx status codes.

We’re very sorry about the disruption. The rest Continue reading

1 2 3 … 3,809 Next »