Nvidia Takes The Commanding Lead In Datacenter Ethernet Switching

Well, that didn’t take long. In April 2020, Nvidia completed its $6.9 billion acquisition of Mellanox Technologies for its InfiniBand and Ethernet switching, and a little more than five years and a GenAI boom later Nvidia has been crowned the leading revenue generator for Ethernet switching in the datacenter by IDC.

Nvidia Takes The Commanding Lead In Datacenter Ethernet Switching was written by Timothy Prickett Morgan at The Next Platform.

Integrating CrowdStrike Falcon Fusion SOAR with Cloudflare’s SASE platform

The challenge of manual response

Security teams know all too well the grind of manual investigations and remediation. With the mass adoption of AI and increasingly automated attacks, defenders cannot afford to rely on overly manual, low priority, and complex workflows.

Heavily burdensome manual response introduces delays as analysts bounce between consoles and high alert volumes, contributing to alert fatigue. Even worse, it prevents security teams from dedicating time to high-priority threats and strategic, innovative work. To keep pace, SOCs need automated responses that contain and remediate common threats at machine speed before they become business-impacting incidents.

Expanding our capabilities with CrowdStrike Falcon® Fusion’ SOAR

That’s why today, we’re excited to announce a new integration between the Cloudflare One platform and CrowdStrike's Falcon® Fusion SOAR.

As part of our ongoing partnership with CrowdStrike, this integration introduces two out-of-the-box integrations for Zero Trust and Email Security designed for organizations already leveraging CrowdStrike Falcon® Insight XDR or CrowdStrike Falcon® Next-Gen SIEM.

This allows SOC teams to gain powerful new capabilities to stop phishing, malware, and suspicious behavior faster, with less manual effort.

Out-of-the-box integrations

Although teams can always create custom automations, we’ve made it simple to get started with two Continue reading

[FATAL] Ansible Release 12.0 Breaks netlab Jinja2 Templates

On September 9th, the ansible release 12.0 appeared on PyPi. It requires ansible-core release 2.19, which includes breaking changes to Jinja2 templating. netlab Jinja2 templates rely on a few Ansible Jinja2 filters; netlab thus imports and uses those filters, and it looks like those imports pulled in the breaking changes that consequently broke the netlab containerlab configuration file template (details).

netlab did not check the Ansible core version (we never had a similar problem in the past), and the installation scripts did not pin the Ansible version (feel free to blame me for this one), which means that any new netlab installation created after September 9th crashed miserably on the simplest lab topologies.

This is the workaround we implemented in netlab release 25.09-post1 (released earlier today):

BGP Multi-Homed with Two ISPs and Two Routers

BGP Multi-Homed with Two ISPs and Two Routers

If you are a Network Engineer working for an Enterprise, you may not work with BGP as often as someone at an ISP does. In most cases, you will only run BGP at the edge of your network to peer with your ISP and leave it at that. There are many ways to connect to an ISP. If you are a small company without your own IP address space or autonomous system, you typically rely on the ISP to allocate a portion of their IP space for you, and you use a static route pointing to them (single-homed). For redundancy, you might connect to two ISPs or take two diverse links from the same ISP (dual-homed/multi-homed). In many of those setups, you may not run BGP yourself, but it depends on the design.

In this post, we will look at a scenario where you already have your own IP address space and an AS number, and you connect to two different ISPs. You will advertise your IP space to the Internet via both ISPs and, at the same time, receive the full Internet routing table from both ISPs.

If you are completely new to BGP, I recommend checking out Continue reading

A deep dive into Cloudflare’s September 12, 2025 dashboard and API outage

What Happened

We had an outage in our Tenant Service API which led to a broad outage of many of our APIs and the Cloudflare Dashboard. 

The incident’s impact stemmed from several issues, but the immediate trigger was a bug in the dashboard. This bug caused repeated, unnecessary calls to the Tenant Service API. The API calls were managed by a React useEffect hook, but we mistakenly included a problematic object in its dependency array. Because this object was recreated on every state or prop change, React treated it as “always new,” causing the useEffect to re-run each time. As a result, the API call executed many times during a single dashboard render instead of just once. This behavior coincided with a service update to the Tenant Service API, compounding instability and ultimately overwhelming the service, which then failed to recover.

When the Tenant Service became overloaded, it had an impact on other APIs and the dashboard because Tenant Service is part of our API request authorization logic.  Without Tenant Service, API request authorization can not be evaluated.  When authorization evaluation fails, API requests return 5xx status codes.

We’re very sorry about the disruption.  The rest Continue reading

Lab: Running IS-IS over IPv4 Unnumbered and IPv6 LLA Interfaces

IS-IS does not use IPv4 or IPv6, so it should be a no-brainer to run it over IPv4 unnumbered or IPv6 LLA interfaces. The latter is true; the former is smack in the middle of the It Depends™ territory.

Want to know more or test the devices you’re usually working with? The Running IS-IS Over Unnumbered/LLA-only Interfaces lab exercise is just what you need.

Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to basic/7-unnumbered and execute netlab up.

AUSNOG 2025

The Australian Network Operators' Group, AUSNOG, held its 19th meeting at the start of September. Rather than simply relate the content of the presentations I'd like to take a few presentations and place them into a broader context to show how such topics fit today's networked environment.

Securing AI Workloads in Kubernetes: Why Traditional Network Security Isn’t Enough

The AI revolution is here, and it’s running on Kubernetes. From fraud detection systems to generative AI platforms, AI-powered applications are no longer experimental projects; they’re mission-critical infrastructure. But with great power comes great responsibility, and for Kubernetes platform teams, that means rethinking security.

But this rapid adoption comes with a challenge: 13% of organizations have already reported breaches of AI models or applications, while another 8% don’t even know if they’ve been compromised. Even more concerning, 97% of breached organizations reported that they lacked proper AI access controls. To address this, we must recognize that AI architectures introduce entirely new attack vectors that traditional security models aren’t equipped to handle.

AI Architectures Introduce New Attack Vectors

AI workloads running in Kubernetes environments introduce a new set of security challenges. Traditional security models often fall short in addressing the unique complexities of AI pipelines, specifically related to The Multi-Cluster Problem, The East-West Traffic Dilemma, and Egress Control Complexity. Let’s explore each of these critical attack vectors in detail.

The Multi-Cluster Problem

Most enterprise AI deployments don’t run in a single cluster. Instead, they typically follow this pattern:

Training Infrastructure (GPU-Heavy)

The Curious Case of ‘ip host’ Configuration Command

Since time immemorial, I have used the ip host router configuration command to get host-to-IP mappings in networking labs without going through the hassle of setting up a DNS server. Some devices even accepted multiple IP addresses in the ip host command, allowing you to list all router interfaces in a single command and get reverse (IP-to-host) mapping working like a charm. Or so I thought 🤦‍♂️

It turns out I’m too old, and what I know is sometimes no longer true. It seems that the last implementation working as I expected is Cisco IOS Classic ☹️

Packet trimming

The latest version of the AI Metrics dashboard uses industry standard sFlow telemetry from network switches to monitor the number of trimmed packets to use as a congestion metric.

Ultra Ethernet Specification Update describes how the Ultra Ethernet Transport (UET) Protocol has the ability to leverage optional “packet trimming” in network switches, which allows packets to be truncated rather than dropped in the fabric during congestion events. As packet spraying causes reordering, it becomes more complicated to detect loss. Packet trimming gives the receiver and the sender an early explicit indication of congestion, allowing immediate loss recovery in spite of reordering, and is a critical feature in the low-RTT environments where UET is designed to operate.

cumulus@switch:~$ nv set system forwarding packet-trim profile packet-trim-default
cumulus@switch:~$ nv config apply

NVIDIA Cumulus Linux release 5.14 for NVIDA Spectrum Ethernet Switches includes support for Packet Trimming. The above command enables packet trimming, sets the DSCP remark value to 11, sets the truncation size to 256 bytes, sets the switch priority to 4, and sets the eligibility to all ports on the switch with traffic class 1, 2, and 3. NVIDA BlueField host adapters respond to trimmed packets to ensure fast congestion recovery.

Continue reading

Labbing Network Technology Details with netlab

It’s been over four years since I published the last Software Gone Wild episode. In the meantime, I spent most of my time developing an open-source labbing tool, so it should be no surprise that the first post-hiatus episode focused on a netlab use case: how Ethan Banks (of the PacketPushers fame) is using the tool to quickly check the technology details for his N is for Networking podcast.

As expected, our discussion took us all over the place, including (according to Riverside AI):

Why IT-Site1 Can’t Ping OT_Site1R – Show and Tell Time #1

In my earlier blog post, Troubleshooting OT Security: Why IT-Site1 Can’t Ping OT_Site1R, we discovered the reason for this issue. Our “who done it” is simple. For security reasons, we are using Cisco TrustSec to keep them from communicating. Which... Read More ›

The post Why IT-Site1 Can’t Ping OT_Site1R – Show and Tell Time #1 appeared first on Networking with FISH.

Navigating DORA with Calico: Strengthening Kubernetes Operational Resilience in Financial Services

A single cyberattack or system outage can threaten not just one financial institution, but the stability of a vast portion of the entire financial sector. For today’s financial enterprises, securing dynamic infrastructure like Kubernetes is a core operational and regulatory challenge. The solution lies in achieving DORA compliance for Kubernetes, which transforms your cloud-native infrastructure into a resilient, compliant, and secure backbone for critical financial services.

The Challenge DORA Seeks to Solve

Before DORA (Digital Operational Resilience Act), rules for financial companies primarily focused on making sure they had enough financial capital to cover losses. But what if a cyberattack or tech failure brought a large part of the financial system down? Even with plenty of financial capital, a major outage could stop most operations and cause big problems for the whole financial market. DORA steps in to fix this. It’s all about making sure financial firms can withstand, respond to, and recover quickly from cyberattacks and other digital disruptions.

What is DORA?

The Digital Operational Resilience Act (DORA) is a European Union (EU) regulation that came into effect on January 17, 2025 and is designed to strengthen the security of financial entities. It establishes uniform requirements across the financial Continue reading

1 2 3 3,809