HS099: From CLI to CFO: Translating Complex Network Data into Clear Strategic and Financial Insights (Sponsored)

IT and network leaders need more than uptime—they need to know what their networks cost, what they deliver, and how future changes will impact the business. That’s where Netos comes in. CEO and founder Richard Foster joins Johna and John in a lively discussion to explore how Netos turns complex operational data into clear financial... Read more »

Comparing AI / ML activity from two production networks

AI Metrics describes how to deploy the open source ai-metrics application. The application provides performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library (NCCL) operations for inter-GPU communications: AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter. The screen capture from the article (above) shows results from a simulated 48,000 GPU cluster.

This article goes beyond simulation to demonstrate the AI Metrics dashboard by comparing live traffic seen in two production AI clusters.

Cluster 1

This cluster consists of 250 GPUs connected via 100G ports to single large switch. The results are pretty consistent with simulation from the original article. In this case there is no Core Link Traffic because the cluster consists of a single switch. The Discards chart shows a burst of Out (egress) discards and the Drop Reasons chart gives the reason as ingress_vlan_filter. The Total Traffic, Operations, Edge Link Traffic, and RDMA Operations charts all show a transient drop in throughput coincident with the discard spike. Further details of the dropped packets, such as source/destination address, operation, ingress / egress port, QP pair, etc. can be extracted from the sFlow Dropped Packet Notifications that are populating Continue reading

“You get Instant Purge, and you get Instant Purge!” — all purge methods now available to all customers

There's a tradition at Cloudflare of launching real products on April 1, instead of the usual joke product announcements circulating online today. In previous years, we've introduced impactful products like 1.1.1.1 and 1.1.1.1 for Families. Today, we're excited to continue this tradition by making every purge method available to all customers, regardless of plan type.

During Birthday Week 2024, we announced our intention to bring the full suite of purge methods — including purge by URL, purge by hostname, purge by tag, purge by prefix, and purge everything — to all Cloudflare plans. Historically, methods other than "purge by URL" and "purge everything" were exclusive to Enterprise customers. However, we've been openly rebuilding our purge pipeline over the past few years (hopefully you’ve read some of our blog series), and we're thrilled to share the results more broadly. We've spent recent months ensuring the new Instant Purge pipeline performs consistently under 150 ms, even during increased load scenarios, making it ready for every customer.  

But that's not all — we're also significantly raising the default purge rate limits for Enterprise customers, allowing even greater purge throughput thanks to the efficiency of our Continue reading

Pure Storage FlashBlade//EXA Boosts AI Performance, Scalability

The injection of generative AI into the bloodstream of the tech titans and now businesses of all sizes and stripes over the past two years has forced IT vendors, from hardware makers to component providers to enterprise application developers, to quickly rework their roadmaps to address the particular demands of and opportunities presented by this emerging technology.

Pure Storage FlashBlade//EXA Boosts AI Performance, Scalability was written by Jeffrey Burt at The Next Platform.

D-Wave Pushes Back At Critics, Shows Off Aggressive Quantum Roadmap

In a panel discussion during GPU Technical Conference a few weeks ago, Nvidia co-founder and chief executive officer Jensen Huang suggested to executives of several quantum computing companies that are calling their systems “computers” may be a misnomer and that a better tag might be “instruments.”

D-Wave Pushes Back At Critics, Shows Off Aggressive Quantum Roadmap was written by Jeffrey Burt at The Next Platform.

NB520: When Good LLMs Do Bad Things, Dell’s Workforce Downsizes and Quantum Key Distribution From Space

Grab a virtual doughnut to blaze through this week’s IT news with Johna Johnson and John Burke as Drew Conry-Murray is enjoying his glazed, filled and sprinkled vacation donuts.  Today, we’re going to talk about getting good LLMs to do bad things, Dell’s workforce downsizing, Cloudflare’s recent outage, some developments in space networking, and more.... Read more »

Nvidia Research: The Real Reason Big Green Commands Big Profits

It is safe to say in 2025 that the best job in the world is the chief executive officer of Nvidia, and that the company’s co-founder, Jensen Huang, has steered the company to great heights as much as fellow co-founders Thomas Watson ever did with International Business Machines, Larry Ellison ever did with Oracle, and Steve Jobs ever did with Apple Computer.

Nvidia Research: The Real Reason Big Green Commands Big Profits was written by Timothy Prickett Morgan at The Next Platform.

TNO022: Secure Automation at Enterprise Scale for the Public Sector with Red Hat Ansible (Sponsored)

There are both benefits and challenges when adopting automation in the public sector, but Red Hat Ansible enhances efficiency, security and service delivery. With the right tooling, network operators can integrate automation into existing environments and improve network security.  Providing insights into adopting automation in the public sector are Tony Dubiel, Principal Specialist Solution Architect... Read more »

HN774: Who Put These OT Risks In My IT Ops? Fortinet Has Answers (Sponsored)

IT and infosec professionals are used to operating and protecting mission-critical infrastructure; servers, databases, load balancers, and so on. But what about valves that control the flow of gas or oil in a refinery? Temperature and vibration sensors that monitor industrial manufacturing processes? If you’re thinking “That’s not my problem” think again. There’s a whole... Read more »

Passive BGP Sessions

The Dynamic BGP Peers lab exercise gave you the opportunity to build a large-scale environment in which routers having an approved source IP addresses (usually matching an ACL/prefix list) can connect to a BGP route reflector or route server.

In a more controlled environment, you’d want to define BGP neighbors on the BGP RR/RS but not waste CPU cycles trying to establish BGP sessions with unreachable neighbors. Welcome to the world of passive BGP sessions.

Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to session/8-passive and execute netlab up.

N4N019: Howdy, Neighbor! And Other Routing Stuff

In today’s episode, we continue the discussion about routing and routing protocols by focusing on commonalities rather than differences among  protocols such as OSPF, RIP, EIGRP, or BGP. We explain how, in general, routing protocols discover each other, communicate, maintain relationships, and exchange routing information. Next, we explore the topics of selecting best paths in... Read more »

Response: Any-to-Any Connectivity in the Internet

Bob left a lengthy comment arguing with the (somewhat black-and-white) claims I made in the Rise of NAT podcast. Let’s start with the any-to-any connectivity:

From my young millennial point of view, the logic is reversed: it is because of NATs and firewalls that the internet became so asymmetrical (client/server) just like the Minitel was designed (yes, I am French), whereas the Internet (and later the web, although a client/server protocol, was meant for everyone to be a client and a server) was designed to be more balanced.

Let’s start with the early Internet. It had no peer-to-peer applications. It connected a few large computers (mainframes) that could act as servers but also allowed terminal-based user access and thus ran per-user clients.