Last year, amid all the talk of the “Blackwell” datacenter GPUs that were launched at last year’s GPU Technicval Conference, Nvidia also introduced the idea of Nvidia Inference Microservices, or NIMs, which are prepackaged enterprise-grade generative AI software stacks that companies can use as virtual copilots to add custom AI software to their own applications. …
Nvidia NeMo Microservices For AI Agents Hits The Market was written by Jeffrey Burt at The Next Platform.
China has lots of coal but it does not have a lot of GPUs or other kinds of tensor and vector math accelerators appropriate for HPC and AI. …
The Separate But Equal AI Realms Of China And The US was written by Timothy Prickett Morgan at The Next Platform.
PARTNER CONTENT: “Developers have to build it, right, and their first concern is to make it work,” says CentML chief executive officer Gennady Pekhimenko. …
Freeing Developers From GenAI Deployment Nightmares was written by Timothy Prickett Morgan at The Next Platform.
Cloudflare’s network spans more than 330 cities in over 125 countries, where we interconnect with over 13,000 network providers in order to provide a broad range of services to millions of customers. The breadth of both our network and our customer base provides us with a unique perspective on Internet resilience, enabling us to observe the impact of Internet disruptions at both a local and national level, as well as at a network level.
As we have noted in the past, this post is intended as a summary overview of observed and confirmed disruptions, and is not an exhaustive or complete list of issues that have occurred during the quarter. A larger list of detected traffic anomalies is available in the Cloudflare Radar Outage Center. Note that both bytes-based and request-based traffic graphs are used within the post to illustrate the impact of the observed disruptions — the choice of metric was generally made based on which better illustrated the impact of the disruption.
In the first quarter of 2025, we observed a significant number of Internet disruptions due to cable damage and power outages. Severe storms caused outages in Ireland and Réunion, and an earthquake caused ongoing connectivity issues Continue reading
Segment Routing simplifies MPLS for the network operator – but not for the developer.
Consider the topology:

I want to steer traffic from R1 to R7 using only blue links. R1 (or controller) runs Constrained Shortest …
The amount of weird stuff we discover in netlab integration tests is astounding, or maybe I have a knack for looking into the wrong dark corners (my wife would definitely agree with that). Today’s special: when having two next hops kills a static route.
TL&DR: default ARP settings on a multi-subnet Linux host are less than optimal.
We use these principles when creating netlab integration tests:
How do you test static routes under these restrictions? Here’s what we did:
What is UV? Astral's uv is a fast, all-in-one Python package and project manager written in Rust that unifies and accelerates Python development workflows by replacing multiple tools and actions including: pip, pip-tools, poetry, pipx, pyenv, virtualenv, and twine initializing a git repository creating base files like .gitignore and pyproject.toml (think of this as requirements.txt READ MORE
The post Ultra Valuable uv for Dynamic, On-Demand Python Virtual Environments appeared first on The Gratuitous Arp.
The semiconductor manufacturing business is absolutely immense. To give the numbers some perspective, in 2024, chip makers generated revenues that were about three quarters of the size of the US defense budget and about two-thirds the size of the social services budget allocated by Congress. …
The Chips Are Definitely Not Down was written by Timothy Prickett Morgan at The Next Platform.
The metrics include:
This article gives step-by-step instructions to set up the dashboard in a production environment.
git clone https://github.com/sflow-rt/prometheus-grafana.git sed -i -e 's/prometheus/ai-metrics/g' prometheus-grafana/env_vars ./prometheus-grafana/start.sh
The easiest way to get started is to use Docker, see Deploy real-time network dashboards using Docker compose, and deploy the sflow/ai-metrics image bundling the AI Metrics application to generate metrics.
scrape_configs:
- job_name: 'sflow-rt-ai-metrics'
metrics_path: /app/ai-metrics/scripts/metrics.js/prometheus/txt
scheme: Continue reading
Though BGP supports the traditional Flow-based Layer 3 Equal Cost Multi-Pathing (ECMP) traffic load balancing method, it is not the best fit for a RoCEv2-based AI backend network. This is because GPU-to-GPU communication creates massive elephant flows, which RDMA-capable NICs transmit at line rate. These flows can easily cause congestion in the backend network.
In ECMP, all packets of a single flow follow the same path. If that path becomes congested, ECMP does not adapt or reroute traffic. This leads to uneven bandwidth usage across the network. Some links become overloaded, while others remain idle. In AI workloads, where multiple high-bandwidth flows occur at the same time, this imbalance can degrade performance.
Deep learning models rely heavily on collective operations like all-reduce, all-gather, and broadcast. These generate dense traffic patterns between GPUs, often at terabit-per-second speeds. If these flows are not evenly distributed, a single congested path can slow down the entire training job.
This chapter introduces two alternative load balancing methods to traditional Flow-Based with Layer 3 ECMP: 1) Flowlet-Based Load Balancing with Adaptive Routing, and 2) Packet-Based Load Balancing with Packet Spraying. Both aim to improve traffic distribution in RoCEv2-based AI backend networks, where conventional flow-based routing often Continue reading
The AI boom has been very, very good to Taiwan Semiconductor Manufacturing Co, which is positioned to do well if Nvidia continues with its hegemony over AI training and inference or if the rebel alliance forms behind AMD or if the hyperscalers and cloud builders dedicate a substantial portion of their capital budgets to etching and packaging homegrown compute engines. …
TSMC: The Second Most Profitable Company In The AI Revolution was written by Timothy Prickett Morgan at The Next Platform.