NetworkingNexus.net

Cisco’s Hyperscale And Cloud AI Push Will Give It Enterprise Clout

Much of the business that Cisco Systems and others have been doing in the AI infrastructure field since OpenAI lit the generative AI fuse with ChatGPT in November 2022 has been deploying hardware and software with the hyperscalers, a lucrative business that led company executives to promise to sell as much as $1 billion in back-end network technology by the end of its fiscal year and then to blow past that a quarter early. …

Cisco’s Hyperscale And Cloud AI Push Will Give It Enterprise Clout was written by Jeffrey Burt at The Next Platform.

Powering All Ethernet AI Networking

Artificial Intelligence (AI), powered by accelerated processing units (XPUs) like GPUs and TPUs, is transforming industries. The network interconnecting these processors is crucial for efficient and successful AI deployments. AI workloads, involving intensive training and rapid inferencing, require very high bandwidth interconnects with low and consistent latency, and the highest reliability to maximize XPU utilization and reduce AI job completion time (JCT). A best-of-breed network with AI-specific optimizations is critical for delivering AI applications, with any JCT slowdown leading to revenue loss. Typical workloads have fewer, very high-bandwidth, low-entropy flows that run for extended periods, exchanging large messages synchronously, necessitating advanced lossless forwarding and specialized operational tools. They differ from cloud networking traffic as summarized below:

Celebrating 11 years of Project Galileo’s global impact

June 2025 marks the 11th anniversary of Project Galileo, Cloudflare’s initiative to provide free cybersecurity protection to vulnerable organizations working in the public interest around the world. From independent media and human rights groups to community activists, Project Galileo supports those often targeted for their essential work in human rights, civil society, and democracy building.

A lot has changed since we marked the 10th anniversary of Project Galileo. Yet, our commitment remains the same: help ensure that organizations doing critical work in human rights have access to the tools they need to stay online. We believe that organizations, no matter where they are in the world, deserve reliable, accessible protection to continue their important work without disruption.

For our 11th anniversary, we're excited to share several updates including:

An interactive Cloudflare Radar report providing insights into the cyber threats faced by at-risk public interest organizations protected under the project.
An expanded commitment to digital rights in the Asia-Pacific region with two new Project Galileo partners.
New stories from organizations protected by Project Galileo working on the frontlines of civil society, human rights, and journalism from around the world.

Tracking and reporting on cyberattacks with the Project Galileo Continue reading

ArubaCX Cannot Count When Dealing with VXLAN

This blog post describes yet another bizarre example of how reliable digital twins are, but don’t worry; they all work great in PowerPoint.

After “fixing” the integration tests to deal with ArubaCX’s notion of VXLAN VNI having 16 bits, the bridging test worked, but the IRB tests kept failing.

In the IRB test, the lab has two layer-3 switches. Each of them should be able to bridge within a VLAN/VXLAN segment and route across the segments.

Peeling The Covers Off Germany’s Exascale “Jupiter” Supercomputer

The newest of the exascale-class supercomputer to be profiled in the Top500 rankings in the June list is the long-awaited “Jupiter” system at Forschungszentrum Jülich facility in Germany. …

Peeling The Covers Off Germany’s Exascale “Jupiter” Supercomputer was written by Timothy Prickett Morgan at The Next Platform.

D2DO275: WebAssembly – The Next Big Thing?

Is WebAssembly the next big thing? Here to help us understand what WebAssembly (WASM) is and what it can and can’t do is Michael Levan, a consultant and WASM trainer. He also dives deeper into WASM details such as hosting, security, monitoring, and the ever-present influence of AI. AdSpot: Spacelift Founded by the creator of... Read more »

We shipped FinalizationRegistry in Workers: why you should never use it

We’ve recently added support for the FinalizationRegistry API in Cloudflare Workers. This API allows developers to request a callback when a JavaScript object is garbage-collected, a feature that can be particularly relevant for managing external resources, such as memory allocated by WebAssembly (Wasm). However, despite its availability, our general advice is: avoid using it directly in most scenarios.

Our decision to add FinalizationRegistry — while still cautioning against using it — opens up a bigger conversation: how memory management works when JavaScript and WebAssembly share the same runtime. This is becoming more common in high-performance web apps, and getting it wrong can lead to memory leaks, out-of-memory errors, and performance issues, especially in resource-constrained environments like Cloudflare Workers.

In this post, we’ll look at how JavaScript and Wasm handle memory differently, why that difference matters, and what FinalizationRegistry is actually useful for. We’ll also explain its limitations, particularly around timing and predictability, walk through why we decided to support it, and how we’ve made it safer to use. Finally, we’ll talk about how newer JavaScript language features offer a more reliable and structured approach to solving these problems.

Memory management 101

JavaScript

JavaScript relies on automatic memory management through a Continue reading

netlab 2.0: Routers, Hosts, Gateways and Bridges

In a previous blog post, I explained how you can use bridges in a netlab topology to create custom LAN segments. Netlab supports two other node roles (host and router), and we’ll eventually add gateways.

netlab assumes that most network devices are routers (it considers a firewall to be a router in disguise), apart from Linux hosts, but you can always change what a node is with the role node attribute:

Secure and Scalable Kubernetes for Multi-Cluster Management

This story is becoming more and more common in the Kubernetes world. What starts as a manageable cluster or two can quickly balloon into a sprawling, multi-cluster architecture spanning public clouds, private data centers, or a bit of both. And with that growth comes a whole new set of headaches. How do you keep tabs on compliance across wildly different configurations? When a service goes down across multiple clusters, how do you pinpoint the cause amidst the chaos? And what about those hard-to-diagnose latency issues that seem to crop up between regions?

The truth is, achieving secure and scalable multi-cluster Kubernetes isn’t about throwing more tools at the problem. It’s about having the right tools and adopting the right best practices. This is where a solution like Calico Cluster Mesh shines, offering those essential capabilities for a seamless multi-cluster experience without the complexity or overhead that you expect with traditional service meshes.

The Multi-Cluster Challenge: When Complexity Takes Over

So, why are so many organizations finding themselves in this multi-cluster maze? Often, it’s driven by solid business reasons:

High Availability and Disaster Recovery: Spreading workloads across multiple regions or clusters means that if one goes down, your users shouldn’t notice.
Continue reading

Worth Reading 061025

This report examines the growing global trend of Internet blocking and its impact on the stability, openness, and interoperability of the Internet. It details how governments and private actors are increasingly using network-level interventions—such as DNS blocking, IP address blocking, and protocol filtering—to control online content.

Broadcom began shipping its answer to Nvidia’s upcoming Quantum-X and Spectrum-X switches on Tuesday: the Tomahawk 6. The chip doubles the bandwidth of its predecessor and comes in both standard and co-packaged optics flavors.

Artificial intelligence, once hailed as the great liberator of human productivity and ingenuity, is now moonlighting as a con artist, data thief, and spy.

The current craze for AI has helped drive a wave of datacenter building, but the industry has run into opposition from local communities in many areas, something it is understandably keen to address.

Low orbit space is growing increasingly crowded. Starlink has over 7,100 satellites in orbit and has plans to grow to 30,000. Project Kuiper has plans for a constellation of 3,232 satellites.

PP066: News Roundup – NIST’s New Exploit Metric, Windows RDP Issues, Compromised Routers, and More

Our security news roundup discusses the compromise of thousands of ASUS routers and the need to perform a full factory reset to remove the malware, why Microsoft allows users to log into Windows via RDP using revoked passwords, and the ongoing risk to US infrastructure from “unexplained communications equipment” being found in Chinese-made electrical equipment... Read more »

HW054: Validation Survey Controversies

A validation survey is typically used for wireless infrastructure post-installation. It compares predictions to real wireless network performance. On today’s show we chat with Joel Crane about validation survey controversies and the challenges of producing a survey whose data has integrity. We cover topics such as the perfectly green heat map, how fast you should... Read more »

Top500 Supers: Even Accelerators Can’t Bend Performance Up To The Moore’s Law Line

The International Super Computing 2025 conference is going on this week in Hamburg, Germany and is celebrating its 40^th anniversary. …

Top500 Supers: Even Accelerators Can’t Bend Performance Up To The Moore’s Law Line was written by Timothy Prickett Morgan at The Next Platform.

Interesting: Juniper MX and Jumbo Frames

Did you know that there’s an Ethernet link between the Packet Forwarding Engine (PFE – data plane) and Routing Engine (RE – control plane) in every Juniper MX? That’s why you have to run two VMs to emulate it (sometimes conveniently packed into one larger VM, proving RFC 1925 rule 6a).

That Ethernet link happens to have the MTU fixed at 1500 bytes. Guess what happens in the world where everyone uses jumbo frames? Did you say fragmentation? Bingo! And what do you think happens when one of those fragments gets dropped due to control-plane policing, and the rest of them are stuck in the reassembly queue? You’ll find the gory details in a lengthy blog post by Nitzan Tzelniker.

Broadcom At The Crossroads Between Merchant And Custom Silicon

After three relatively short years of explosive growth thanks to the GenAI boom, AI is driving half of systems revenues worldwide already. …

Broadcom At The Crossroads Between Merchant And Custom Silicon was written by Timothy Prickett Morgan at The Next Platform.

AI Metrics with Grafana Cloud

The Grafana AI Metrics dashboard shown above tracks performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library (NCCL) operations for inter-GPU communications: AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter.

The metrics include:

Total Traffic Total traffic entering fabric
Operations Total RoCEv2 operations broken out by type
Core Link Traffic Histogram of load on fabric links
Edge Link Traffic Histogram of load on access ports
RDMA Operations Total RDMA operations
RDMA Bytes Average RDMA operation size
Credits Average number of credits in RoCEv2 acknowledgements
Period Detected period of compute / exchange activity on fabric (in this case just over 0.5 seconds)
Congestion Total ECN / CNP congestion messages
Errors Total ingress / egress errors
Discards Total ingress / egress discards
Drop Reasons Packet drop reasons

AI Metrics with Prometheus and Grafana describes how to stand up an analytics stack with Prometheus and Grafana to track performance metrics for an AI/ML GPU cluster. This article shows how to integrate with Prometheus and Grafana hosted in the cloud, Grafana Cloud, instead of running the services locally.

Note: Grafana Cloud has a free service tier that can be used to test this example.

NB530: Broadcom Hits 102.4 Tbps With Tomahawk 6; Wireshark Debuts Certificate Program

Take a Network Break! We start with two critical vulnerabilities: one affecting cloud versions of Cisco ISE, and the other for HPE StoreOnce. In the news, Broadcom announces the Tomahawk 6 ASIC with 102.4Tbits of bandwidth, SentinelOne suffers a self-imposed network outage, and the Wireshark Foundation announces its first-ever professional certification for Wireshark. Cisco rebrands... Read more »

Worth Reading 060925

The story of computing and communications over the past eighty years has been a story of quite astounding improvements in the capability, cost and efficiency of computers and communications.

In recent discussions, it became clear that additional information could be helpful, breaking down what a user or administrator needs to understand about TLS implementation and configuration options to better assess points of potential exposure.

The use of pseudo-random processes to generate secret quantities can result in pseudo-security. A sophisticated attacker may find it easier to reproduce the environment that produced the secret quantities and to search the resulting small set of possibilities than to locate the quantities in the whole of the potential number space.

We’ve all had the serendipity experience, even online — clicking through a chain of links, scanning Google search results, drifting between loosely connected ideas. But search engines and information retrieval systems aren’t designed to enhance serendipity.

Here I want to look at just one day of the operation of the Internet’s BGP network by looking at the behaviour of a single BGP session. The day we’ll use for this study is the 8h May 2025, and the BGP vantage point used here is an unremarkable Continue reading

Meet Noction IRP v4.2.8 – Smarter Routing, Deeper Insights, Tighter Control.

The post Meet Noction IRP v4.2.8 – Smarter Routing, Deeper Insights, Tighter Control. appeared first on Noction.

Getting Started with the Pytest Plugin for Infrahub

We all write code, but how do we know the changes we make in the future won’t break something that used to work? That’s where testing becomes important.

The idea is to catch problems early, ideally before they reach production. In the Python world, one of the most common ways to do this is with a tool called pytest. It lets you write tests to check that your code behaves the way you expect and helps you catch issues before they become a bigger problem.

Originally published under - https://www.opsmill.com/pytest-plugin-infrahub/

When working with Infrahub, testing is just as important. You might want to make sure your GraphQL queries are valid, your Jinja2 templates render correctly, or your transformations behave as expected.

Infrahub simplifies this by offering a pytest plugin that doesn’t require Python code; you define tests using plain YAML. This makes testing more accessible to teams across roles and speeds up the feedback loop during development.

These kinds of unit tests aren’t just about convenience, they help establish a production-ready automation system. With automated checks built into your process, every change is validated consistently, reducing the chance of something breaking unexpectedly. That consistency builds trust when your Continue reading

« Previous 1 2 3 4 5 6 … 3,793 Next »