Archive

Category Archives for "Networking"

DNS OARC 43

The DNS Operations, Analysis, and Research Center (DNS-OARC) brings together DNS service operators, DNS software implementors, and researchers together to share concerns, information and learn together about the operation and evolution of the DNS. The most recent DNS OARC workshop was held in Prague, October 2024. Here are my thoughts on some of the material that was presented and discussed at this workshop.

NVIDIA Cumulus Linux 5.11 for AI / ML


NVIDIA Cumulus Linux 5.11 includes major upgrades to the sFlow agent that fully exposes the advanced instrumentation built into NVIDIA Spectrum-X silicon. The enhanced real-time telemetry is particularly relevant to the AI / machine learning workloads that Spectrum-X is designed to handle.

With Cumulus Linux 5.11, the sFlow agent is easily configured using nvue commands, see Monitoring System Statistics and Network Traffic with sFlow:

nv set system sflow dropmon hw
nv set system sflow poll-interval 20
nv set system sflow collector 192.0.2.1
nv set system sflow state enabled
nv config apply

Note: In this case, enabling dropmon ensures that every dropped packet is captured, along with ingress port and drop reason (e.g. ttl_exceeded).

The same commands should be applied to every switch in the fabric for comprehensive visibility.

RDMA over Converged Ethernet (RoCE) describes how sFlow provides detailed visibility into RoCE flows used to move data between GPUs in an AI / ML data center fabric. The chart above from the RDMA network visibility demonstration at the SC22 conference shows that sFlow monitoring easily scales to the 400/800G speeds needed for machine learning.
In this example, the sFlow-RT real-time analytics engine receives sFlow Continue reading

Tech Bytes: MinIO Optimizes Object Storage for AI Infrastructure (Sponsored)

Today on the Tech Bytes podcast we welcome back sponsor MinIO to talk about how AI is altering the data infrastructure landscape, and why organizations are looking to build AI infrastructure on-prem. We also dig into MinIO’s AIStor, a software-only, distributed object store that offers simplicity, scalability, and performance for AI infrastructure and other high-performance... Read more »

How We Made Fast Cloud-to-Cloud Connections Possible

The traditional way to enable things hosted in different clouds to talk to each other has been to order a physical cross-connect (a literal cable) from a colo provider that hosts the clouds’ “onramps” (private connections to the clouds’ networks) to link them. Various software-defined alternatives have emerged in recent years. Some do essentially the same thing but virtually, while others, like typical SD-WANs, rely on things like IPsec or other encryption protocols to secure connections traversing the internet. There are other, less mature approaches, including some cloud providers’ own intercloud connectivity services. And there’s always the option to send your cloud-to-cloud API requests over the public internet and hope for the best. The traditional ways tend to be costly and complex to set up, requiring specialized networking knowledge. A little more than a year ago, we decided to build a cloud-to-cloud private connectivity service that would be easier to use. The goal was to give developers who are proficient in cloud but not in networking a way to create multicloud connections and manage them with familiar tools, like Terraform or Pulumi, without hosting any infrastructure in our data centers. It was an interesting challenge, not in the least from Continue reading

SC24 Dropped packet visibility demonstration

The real-time dashboard is a joint InMon / Arista demonstration at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC24) conference being held this week in Atlanta.

The conference network used in the demonstration, SCinet, is described as the most powerful and advanced network on Earth, connecting the SC community to the world.

The sFlow Packet Drop Monitoring In High Performance Networks dashboard combines telemetry from all the Arista switches in the SCinet network to provide real-time network-wide view of performance. Each of the three charts demonstrate a different type of measurement in the sFlow telemetry stream:

  • Counters: Total Traffic shows total traffic calculated from interface counters streamed from all interfaces. Counters provide a useful way of accurately reporting byte, frame, error and discard counters for each network interface. In this case, the chart rolls up data from all interfaces to trend total traffic on the network.
  • Samples: Top Flows shows the top 5 largest traffic flows traversing the network. The chart is based on sFlow's random packet sampling mechanism, providing a scaleable method of determining the hosts and services responsible for the traffic reported by the counters. Visibility into top flows is essential if one Continue reading

From Python to Go 003. Functions, External Modules, And Linux/MAC Environment.

Hello my friend,

We continue our journey in the realm of software development, or how some people call it programming. In the previous blog post we’ve introduced you to variables, which are the storage containers for your data. Today we’ll take a look into functions, which are next essential building of any application.

How Is Automation Important Today?

Interviewing people to various roles at high-profiles companies, I was sadly surprised that there is very low number of people who really understand network and IT infrastructure automation and are capable to write the Python code to do so. And I’m talking about companies, who are genuinely interested in good automation engineers. Don’t waste your chance, start learning network automation today:

We offer the following training programs in network automation for you:

During these trainings you will learn the following topics:

  • Success and failure strategies to build the automation tools.
  • Principles of software developments and the most useful and convenient tools.
  • Data encoding (free-text, XML, JSON, YAML, Protobuf).
  • Model-driven network automation with YANG, NETCONF, RESTCONF, GNMI.
  • Full Continue reading

HN758: How Selector Built an AI Language Model for Networking (Sponsored)

On today’s episode, artificial intelligence with sponsor Selector.AI. If you’re curious and maybe still skeptical about the value AI brings to network operations, listen to this episode. Selector is on the forefront of AIOps for networking, building models that are customized and specifically targeted at networks. What Selector is doing is NOT simply the low-hanging... Read more »

Introduction to Arista PyeAPI

Introduction to Arista PyeAPI

The Python Client for eAPI (pyeapi) is a Python library that simplifies working with Arista eAPI, removing the need to deal with the specifics of its implementation. It's straightforward to configure and use. In this blog post, we'll look at how to install pyeapi and go through some examples.

If you're familiar with Arista's eAPI, you know that you can browse to the device's IP in a browser, run commands, and get the output directly. You can also achieve the same result using Python, but it typically requires understanding which libraries to use and how to construct the REST API requests.

However, pyeapi simplifies all of this. You don't need to worry about what's happening behind the scenes. Below is a screenshot of running show vlan command via the REST API, and in the following examples, we'll see how to get the same output using pyeapi.

Introduction to Arista PyeAPI

PyeAPI Installation

To install pyeapi, you can use pip, which is the standard package manager for Python. It's a good practice to use a virtual environment (venv) to keep your dependencies isolated and avoid conflicts with other projects. First, create and activate a virtual environment. Once your virtual Continue reading

N4N003: What’s a VLAN?

Today we explore Virtual Local Area Networks (VLANs). This topic was prompted by a question from college student Douglas. We’ll explain the fundamental concepts of VLANs, such as their role in segmenting and managing network traffic, and the technical details for implementation. We’ll also address key topics including VLAN tags, access and trunk ports, and... Read more »

What’s new in Cloudflare: Account Owned Tokens and Zaraz Automated Actions

In October 2024, we started publishing roundup blog posts to share the latest features and updates from our teams. Today, we are announcing general availability for Account Owned Tokens, which allow organizations to improve access control for their Cloudflare services. Additionally, we are launching Zaraz Automated Actions, which is a new feature designed to streamline event tracking and tool integration when setting up third-party tools. By automating common actions like pageviews, custom events, and e-commerce tracking, it removes the need for manual configurations.

Improving access control for Cloudflare services with Account Owned Tokens

Cloudflare is critical infrastructure for the Internet, and we understand that many of the organizations that build on Cloudflare rely on apps and integrations outside the platform to make their lives easier. In order to allow access to Cloudflare resources, these apps and integrations interact with Cloudflare via our API, enabled by access tokens and API keys. Today, the API Access Tokens and API keys on the Cloudflare platform are owned by individual users, which can lead to some difficulty representing services, and adds an additional dependency on managing users alongside token permissions.

What’s new about Account Owned Tokens

First, a little explanation because the terms can Continue reading

AI for Network Engineers: Convolutional Neural Network

 Introduction


The previous chapter explained how Feed-forward Neural Networks (FNNs) can be used for multi-class classification of 28 x 28 pixel handwritten digits from the MNIST dataset. While FNNs work well for this type of task, they have significant limitations when dealing with larger, high-resolution color images.

In neural network terminology, each RGB value of an image is treated as an input feature. For instance, a high-resolution 600 dpi RGB color image with dimensions 3.937 x 3.937 inches contains approximately 5.58 million pixels, resulting in roughly 17 million RGB values.

If we use a fully connected FNN for training, all these 17 million input values are fed into every neuron in the first hidden layer. Each neuron must compute a weighted sum based on these 17 million inputs. The memory required for storing the weights depends on the numerical precision format used. For example, using the 16-bit floating-point (FP16) format, each weight requires 2 bytes. Thus, the memory requirement per neuron would be approximately 32 MB. If the first hidden layer has 10,000 neurons, the total memory required for storing the weights in this layer would be around 316 GB.

In contrast, Convolutional Neural Networks (CNNs) use Continue reading

D2DO257: Love is in the Firmware

With the current cultural emphasis on AI and how that is changing our world, we often forget the human element of individuals and teams when building an effective software development team. On today’s Day Two DevOps, we explore the psychology behind software teams, psychological safety for those teams and how the advent of AI plays... Read more »

PP039: Securing Active Directory from a Pen Tester’s Perspective

Microsoft’s Active Directory and Entra ID are valuable targets for attackers because they store critical identity information. On today’s Packet Protector, we talk with penetration tester and security consultant Eric Kuehn about how he approaches compromising AD/Entra ID, common problems he sees during client engagements, quick wins for administrators and security pros to fortify their... Read more »

Worldwide deployment of real-time flow analytics

Industry standard sFlow telemetry is widely supported by network equipment vendors and network management platforms. However, the advent of real-time sFlow analytics has opened up a range of new applications for sFlow. The map above shows the proportion of sFlow-RT instances running in each of the over 70 countries in which it is deployed.

The following use cases are driving current deployments:

Addressing the challenge of operating AI / ML clusters is the emerging application for sFlow visibility. High speed (400/800G) data center switches needed to handle machine learning traffic flows include sFlow agents and real-time analytics are essential to optimize the network so that expensive GPU and compute resources are fully utilized, see Leveraging open technologies to monitor packet drops in AI cluster fabrics.

If you would like to see how real-time network analytics can transform network operations, Getting Started describes how to download and configure sFlow-RT analytics software for use in your network, or how to try it out using an emulator, or pre-captured data.