There has always been a tension in the datacenter between ever-advancing technology and the practical economic gravity of the company balance sheet. …
The Time Has Come To Upgrade Aging Server Fleets was written by Timothy Prickett Morgan at The Next Platform.
I recently tested the NVIDIA Air Infrastructure Simulation Platform and would like to share my first experiences with you. What is NVIDIA Air Infrastructure Simulation Platform or NVIDIA Air? In a nutshell, NVIDIA Air is a cloud-hosted, data center simulation platform. Where you can: Test and validate network configurations, features, and automation code. Build your own data center topology or choose from an impressive list of pre-built topologies. You can use Cumulus Linux or SONiC as network operating system, add Ubuntu nodes, and more. Import / Export lab topologies. Share the…
The post NVIDIA Air Infrastructure Simulation Platform appeared first on AboutNetworks.net.
Remote Direct Memory Access (RDMA) architecture enables efficient data transfer between Compute Nodes (CN) in a High-Performance Computing (HPC) environment. RDMA over Converged Ethernet version 2 (RoCEv2) utilizes a routed IP Fabric as a transport network for RDMA messages. Due to the nature of RDMA packet flow, the transport network must provide lossless, low-latency packet transmission. The RoCEv2 solution uses UDP in the transport layer, which does not handle packet losses caused by network congestion (buffer overflow on switches or on a receiving Compute Node). To avoid buffer overflow issues, Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) are used as signaling mechanisms to react to buffer threshold violations by requesting a lower packet transfer rate.
Before moving to RDMA processes, let’s take a brief look at our example Compute Nodes. Figure 1-1 illustrates our example Compute Nodes (CN). Both Client and Server CNs are equipped with one Graphical Processing Unit (GPU). The GPU has a Network Interface Card (NIC) with one interface. Additionally, the GPU has Device Memory Units to which it has a direct connection, bypassing the CPU. In real life, a CN may have several GPUs, each with multiple memory units. Intra-GPU communication within the CN happens over high-speed NVLinks. The connection to remote CNs occurs over the NIC, which has at least one high-speed uplink port/interface.
Figure 1-1 also shows the basic idea of a stacked Fine-Grained 3D DRAM (FG-DRAM) solution. In our example, there are four vertically interconnected DRAM dies, each divided into eight Banks. Each Bank contains four memory arrays, each consisting of rows and columns that contain memory units (transistors whose charge indicates whether a bit is set to 1 or 0). FG-DRAM enables cross-DRAM grouping into Ranks, increasing memory capacity and bandwidth.
The upcoming sections introduce the required processes and operations when the Client Compute Node wants to write data from its device memory to the Server Compute Node’s device memory. I will discuss the design models and requirements for lossless IP Fabric in later chapters.
RFC 4264 defines BGP wedgies as “a class of BGP configurations for which there is more than one potential outcome, and where forwarding states other than the intended state are equally stable.” Even worse, “the stable state where BGP converges may be selected by BGP in a non-deterministic manner.”
Want to know more? You can explore a real-life BGP wedgie and fix it in the latest BGP lab exercise.
RFC 4264 defines BGP wedgies as “a class of BGP configurations for which there is more than one potential outcome, and where forwarding states other than the intended state are equally stable.” Even worse, “the stable state where BGP converges may be selected by BGP in a non-deterministic manner.”
Want to know more? You can explore a real-life BGP wedgie and fix it in the latest BGP lab exercise.
polyfill.io, a popular JavaScript library service, can no longer be trusted and should be removed from websites.
Multiple reports, corroborated with data seen by our own client-side security system, Page Shield, have shown that the polyfill service was being used, and could be used again, to inject malicious JavaScript code into users’ browsers. This is a real threat to the Internet at large given the popularity of this library.
We have, over the last 24 hours, released an automatic JavaScript URL rewriting service that will rewrite any link to polyfill.io found in a website proxied by Cloudflare to a link to our mirror under cdnjs. This will avoid breaking site functionality while mitigating the risk of a supply chain attack.
Any website on the free plan has this feature automatically activated now. Websites on any paid plan can turn on this feature with a single click.
You can find this new feature under Security ⇒ Settings on any zone using Cloudflare.
Contrary to what is stated on the polyfill.io website, Cloudflare has never recommended the polyfill.io service or authorized their use of Cloudflare’s name on their website. We have asked them to remove the Continue reading
The jury is still out on a lot of things about this exploding AI market and the re-convergence that it will have with traditional HPC systems for running simulations and models. …
What If Omni-Path Morphs Into The Best Ultra Ethernet? was written by Timothy Prickett Morgan at The Next Platform.
On Thursday, June 20, 2024, two independent events caused an increase in latency and error rates for Internet properties and Cloudflare services that lasted 114 minutes. During the 30-minute peak of the impact, we saw that 1.4 - 2.1% of HTTP requests to our CDN received a generic error page, and observed a 3x increase for the 99th percentile Time To First Byte (TTFB) latency.
These events occurred because:
Impact from these events were observed in many Cloudflare data centers around the world.
With respect to the backbone congestion event, we were already working on expanding backbone capacity in the affected data centers, and improving our network mitigations to use more information about the available capacity on alternative network paths when taking action. In the remainder of this blog post, we will go into Continue reading
I plan to add several challenge labs using multihop EBGP sessions to the BGP labs project, including:
However, I would love to start with a simple use case to help engineers unfamiliar with BGP realize when they might have to use multihop EBGP sessions. Unfortunately, I can’t find one, and the scenarios where I used multihop EBGP in the past (EBGP load balancing and using a low-end router in the EBGP path, where I was effectively using the reverse application of #2 as a customer) are mostly irrelevant.
Would you have an easy-to-understand use case that is best solved with a multihop EBGP session? Please share it in the comments. Thanks a million!
I plan to add several challenge labs using multihop EBGP sessions to the BGP labs project, including:
However, I would love to start with a simple use case to help engineers unfamiliar with BGP realize when they might have to use multihop EBGP sessions. Unfortunately, I can’t find one, and the scenarios where I used multihop EBGP in the past (EBGP load balancing and using a low-end router in the EBGP path, where I was effectively using the reverse application of #2 as a customer) are mostly irrelevant.
Would you have an easy-to-understand use case that is best solved with a multihop EBGP session? Please share it in the comments. Thanks a million!
I’ve been working with Palo Alto Firewalls and Panorama for a few years now, yet the best ways to use Templates still seem somewhat mysterious. I bet many of you feel the same way. Since every network is unique, there isn’t one “right” way to manage this. In this blog post, I’ll break down what Templates and Template Stacks are in Panorama and share some effective strategies for organizing them. Let’s dive in.
If you’re new to Panorama, it’s a centralized management tool that simplifies managing multiple Palo Alto firewalls from a single place. There are two key concepts in Panorama which are Device Groups and Templates. Device Groups manage the configurations you’d usually find under the Policies and Objects tabs on the firewall, while Templates manage with configurations from the Network and Device tabs.
It’s important to note that Device Groups and Templates serve different purposes and manage different parts of the configurations. This blog post will focus exclusively on Templates. If you need a refresher on Device Groups and Templates, I’ve covered that in a previous post. Feel free to check it out here for a quick recap.
Designing chips and shepherding them through the foundry and package and assembly is a complex and difficult process, and not having these skills at a national level has profound implications for the competitiveness of those nations. …
Ruminations About Europe’s “Alice Recoque” Exascale Supercomputer was written by Timothy Prickett Morgan at The Next Platform.
Broadcom’s $69 billion acquisition of virtualization stalwart VMware was not an easy proposition. …
Cloud Foundation Updates Reflect The New VMware By Broadcom was written by Jeffrey Burt at The Next Platform.
Kubernetes is used everywhere, from test environments to the most critical production foundations that we use daily, making it undoubtedly a de facto in cloud computing. While this is great news for everyone who works with, administers, and expands Kubernetes, the downside is that it makes Kubernetes a favorable target for malicious actors.
Malicious actors typically exploit flaws in the system to gain access to a portion of the environment. They then chain these flaws together to move laterally within the environment, ultimately seeking root access or access to critical information.
While the best way to fix security flaws in any software is to patch it with appropriate fixes that the project maintainers publish, there are certain security practices that you can adopt to fortify your environment, like using network policies. However, most people find network policies complex and overwhelming, which discourages them from implementing policies in their environment.
In this blog post, we will examine four pain points that people face when they want to implement network policies and provide solutions to help you effectively secure your Kubernetes environment.
In Kubernetes, a network policy (KNP) resource is the Continue reading