In a world where allocations of “Hopper” H100 GPUs coming out of Nvidia’s factories are going out well into 2024, and the allocations for the impending “Antares” MI300X and MI300A GPUs are probably long since spoken for, anyone trying to build a GPU cluster to power a large language model for training or inference has to think outside of the box. …
In this episode, Michael, Kristina, and Adriana Villela discuss the challenges and benefits of running Kubernetes on Nomad. Adriana shares her experience of using Nomad in a data center, highlighting its simplicity and ease of deployment compared to Kubernetes. The speakers also discuss the differences between the two platforms, the concept of vendor lock-in, and... Read more »
In this episode, Michael, Kristina, and Adriana Villela discuss the challenges and benefits of running Kubernetes on Nomad. Adriana shares her experience of using Nomad in a data center, highlighting its simplicity and ease of deployment compared to Kubernetes. The speakers also discuss the differences between the two platforms, the concept of vendor lock-in, and […]
The dashboard displays data gathered from open source Host sFlow agents installed on Data Transfer Nodes (DTNs) run by the Caltech High Energy Physics Department and used for handling transfer of large scientific data sets (for example, accessing experiment data from the CERN particle accelerator). Network performance monitoring describes how the Host sFlow agents augment standard sFlow telemetry with measurements that the Linux kernel maintains as part of the normal operation of the TCP protocol stack.
The dashboard shows 5 large flows (greater than 50 Gigabits per Second). For each large flow being tracked, additional TCP performance metrics are displayed:
RTT The round trip time observed between DTNs
RTT Wait The amount of time that data waits on sender before it can be sent.
RTT Sdev The standard deviation on observed RTT. This variation is a measure of jitter.
Avg. Packet Size The average packet size used to send data.
Packets in Flight The number of unacknowledged packets.
Terry Slattery joins Tom and Russ to continue the conversation on network automation—and why networks are not as automated as they should be. This is part one of a two-part series; the second part will be published in two weeks as Hedge episode 204.
As IPv6 usage continues its inevitable rise, now’s the time to admit you have or soon will have some IPv6 on the network and to engage with the IPv6 plane to incorporate it into your management strategy.
The basis of Zero Trust is defining granular controls and authorization policies per application, user, and device. Having a system with a sufficient level of granularity to do this is crucial to meet both regulatory and security requirements. But there is a potential downside to so many controls: in order to troubleshoot user issues, an administrator has to consider a complex combination of variables across applications, user identity, and device information, which may require painstakingly sifting through logs.
We think there’s a better way — which is why, starting today, administrators can easily audit all active user sessions and associated data used by their Cloudflare One policies. This enables the best of both worlds: extremely granular controls, while maintaining an improved ability to troubleshoot and diagnose Zero Trust deployments in a single, simple control panel. Information that previously lived in a user’s browser or changed dynamically is now available to administrators without the need to bother an end user or dig into logs.
A quick primer on application authentication and authorization
Authentication and Authorization are the two components that a Zero Trust policy evaluates before allowing a user access to a resource.
Authentication is the process of verifying the identity Continue reading
Network engineers and architects considering IPv6 can benefit from the experiences of those who have gone before them by avoiding the problems that have bedeviled other deployments. On today’s show, your hosts discuss three typical pitfalls and how to get over or around them without falling in. Those IPv6 pitfalls include: IPv4 thinking Deploying ULA... Read more »
Network engineers and architects considering IPv6 can benefit from the experiences of those who have gone before them by avoiding the problems that have bedeviled other deployments. On today's show, your hosts discuss three typical pitfalls and how to get over or around them without falling in.
First let me just say that you have got to love a zero indexed conference! If you are a network engineer and you don’t know what that means we need to chat..and that situation was a key topic of the conference. In my mind the goal of the conference was to assess the state of READ MORE
Microsoft has announced that it is partnering with chipmaker Nvidia and chip-designing software provider Synopsys to provide enterprises with foundry services and a new chip-design assistant. The announcement was made at the ongoing Microsoft Ignite conference.The foundry services from Nvidia, which will deployed on Microsoft Azure, will combine three of Nvidia’s elements — its foundation models, its NeMo framework, and Nvidia’s DGX Cloud service.To read this article in full, please click here
Cloudflare experienced a significant outage in early November 2023 and published a detailed post-mortem report. You should read the whole report; here are my CliffsNotes:
Cloudflare experienced a significant outage in early November 2023 and published a detailed post-mortem report. You should read the whole report; here are my CliffsNotes:
If we are going to update RFC 3901, "DNS IPv6 Transport Guidelines," and offer a revised set of guidelines that are more positive guidelines about the use of IPv6 in the DNS, then what should such updated guidelines say?
After many years of rumors, Microsoft has finally confirmed that it is following rivals Amazon Web Services and Google into the design of custom processors and accelerators for their clouds. …
The conference network used in the demonstration, SCinet, is described as the most powerful and advanced network on Earth, connecting the SC community to the world.
In this example, the sFlow-RT real-time analytics engine receives sFlow telemetry from switches, routers, and servers in the SCinet network and creates metrics to drive the real-time heatmap. Getting Started provides a quick introduction to deploying and using sFlow-RT for real-time network-wide flow analytics.
While 95% of businesses are aware that AI will increase infrastructure workloads, only 17% have networks that are flexible enough to handle the complex requirements of AI. Given that disconnect, it’s too early to see widespread deployment of AI at scale, despite the hype.That's one of the key takeaways from Cisco’s inaugural AI Readiness Index, a survey of 8,000 global companies aimed at measuring corporate interest in and ability to utilize AI technologies.To read this article in full, please click here
Welcome to a special edition of Day Two Cloud. Host Ned Bellavance traveled to KubeCon Chicago 2023 and spoke to vendors and open source maintainers about what’s going on in the cloud-native ecosystem. This episode features conversations on platform engineering. Part 2 will focus on security. Episode Guests: Cole Morrison, Developer Advocate at HashiCorp LinkedIn... Read more »