Peter

Author Archives: Peter

Broadcom Mirror on Drop (MoD)

Networking Field Day 23 included a presentation by Bhaskar Chinni describing Broadcom's Mirror-on-Drop (MOD) capability. MOD capable hardware can generate a notification whenever a packet is dropped by the ASIC, reporting the packet header and the reason that the packet was dropped. MOD is supported by Trident 3, Tomahawk 3,  and Jericho 2 or later ASICs that are included in popular data center switches and widely deployed in data centers.

The recently published sFlow Dropped Packet Notification Structures specification adds drop notifications to industry standard sFlow telemetry export, complementing the existing push based counter and packet sampling measurements. The inclusion of drop monitoring in sFlow will allow the benefits of MOD to be fully realized, ensuring consistent end-to-end visibility into dropped packets across multiple vendors and network operating systems.

Using Advanced Telemetry to Correlate GPU and Network Performance Issues demonstrates how packet drop notifications from NVIDA Mellanox switches forms part of an integrated sFlow telemetry stream that provides the system wide observability needed to drive automation.

MOD instrumentation on Broadcom based switches provides the foundation needed for network vendors to integrate the Continue reading

Using Advanced Telemetry to Correlate GPU and Network Performance Issues


The image above was captured from the recent talk Using Advanced Telemetry to Correlate GPU and Network Performance Issues [A21870] presented at the NVIDIA GTC conference. The talk includes a demonstration of monitoring a high performance GPU compute cluster in real-time. The real-time dashboard provides an up to the second view of key performance metrics for the cluster.

This diagram shows the elements of the GPU compute cluster that was demonstrated. Cumulus Linux running on the switches reduces operational complexity by allowing you to run the same Linux operating system on the network devices as is run on the compute servers. sFlow telemetry is generated by the open source Host sFlow agent that runs on the servers and the switches, using standard Linux APIs to enable instrumentation and gather measurements. On switches, the measurements are offloaded to the ASIC to provide line rate monitoring.

Telemetry from all the switches and servers in the cluster is streamed to an sFlow-RT analyzer, which builds a real-time view of performance that can be used to drive operational dashboards and automation.

The Real-time GPU and network telemetry dashboard combines measurements from all the devices to provide view of cluster performance. Each of the three Continue reading

Cumulus Linux 4.2

Cumulus Linux is a network operating system for open networking hardware. Cumulus VX is a free virtual appliance that allows network engineers to experiment with Cumulus Linux and verify configurations before deploying into production. 
The Cumulus VX documentation describes how to build network topologies in KVM, VirtualBox, using VMWare hypervisors. If you want to run virtual machines locally, Cumulus in the Cloud is a free service that will allow you to access pre-built networks in the public cloud.

A key feature of Cumulus Linux is the use of the Linux kernel as the authoritative repository of network state. A result of this approach is that the behavior of a Cumulus Linux VX virtual appliance is the same as Cumulus Linux running on a hardware switch. For example, the open source FRR routing daemon shipped with Cumulus Linux uses the Linux netlink API to push routes to the kernel, which forwards packets in the virtual appliance. On a physical switch, routes are still pushed to the kernel, but kernel routing configuration is then offloaded to the switch ASIC so that packets bypass the kernel and are routed by hardware.

Cumulus Linux includes the open source Host sFlow agent. Here again, Continue reading

Real-time trending of dropped packets

Discard Browser is a recently released open source application running on the sFlow-RT real-time analytics engine. The software uses streaming analytics to trend dropped packets.
Using sFlow to monitor dropped packets describes the recently added packet drop monitoring functionality added to the open source Host sFlow agent. The article describes how to install and configure the agent on Linux-based platforms and stream industry standard sFlow telemetry to an sFlow collector.

Visibility into dropped packets describes instrumentation, recently added to the Linux kernel, that provides visibility into packets dropped by the kernel data path on a host, or dropped by a switch ASIC when packets are forwarded in hardware.  Extending sFlow to provide visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network-wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.

Packet discard monitoring complements sFlow's existing counter polling and packet sampling mechanisms and shares a common data model so that all three sources of data can be correlated.  For example, if packets are being discarded because of buffer exhaustion, the discard records don't necessarily Continue reading

Using sFlow to monitor dropped packets

Visibility into dropped packets describes instrumentation, recently added to the Linux kernel, that provides visibility into packets dropped by the kernel data path on a host, or dropped by a switch ASIC when packets are forwarded in hardware. This article describes integration of drop monitoring in the open source Host sFlow agent and inclusion of drop reporting as part of industry standard sFlow telemetry.

Extending sFlow to provide visibility into dropped packets offers significant benefits for network troubleshooting, providing real-time network-wide visibility into the specific packets that were dropped as well the reason the packet was dropped. This visibility instantly reveals the root cause of drops and the impacted connections.

Packet discard monitoring complements sFlow's existing counter polling and packet sampling mechanisms and shares a common data model so that all three sources of data can be correlated.  For example, if packets are being discarded because of buffer exhaustion, the discard records don't necessarily tell the whole story. The discarded packets may represent mice flows that are victims of an elephant flow. Packet samples will reveal the traffic that isn't being dropped and provide a more complete picture. Counter data adds additional information such as CPU load, interface speed, Continue reading

Visibility into dropped packets

Dropped packets have a profound impact on network performance and availability. Packet discards due to congestion can significantly impact application performance. Dropped packets due to black hole routes, expired TTLs, MTU mismatches, etc. can result in insidious connection failures that are time consuming and difficult to diagnose.

Devlink Trap describes recent changes to the Linux drop monitor service that provide visibility into packets dropped by switch ASIC hardware. When a packet is dropped by the ASIC, an event is generated that includes the header of the dropped packet and the reason why it was dropped. A hardware policer is used to limit the number of events generated by the ASIC to a rate that can be handled by the Linux kernel. The events are delivered to userspace applications using the Linux netlink service.

Running the dropwatch command line tool on an Ubuntu 20 system demonstrates the instrumentation:
pp@ubuntu20:~$ sudo dropwatch
Initializing null lookup method
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: __udp4_lib_rcv+0xae5/0xbb0 (0xffffffffb05ead95)
origin: software
input port ifindex: 2
timestamp: Wed Jul 15 23:57:36 2020 223253465 nsec
protocol: 0x800
length: 128
original Continue reading

Large flow marking using BGP Flowspec

Elephant Detection in Virtual Switches & Mitigation in Hardware discusses a VMware and Cumulus demonstration, Elephants and Mice, in which the virtual switch on a host detects and marks large "Elephant" flows and the hardware switch enforces priority queueing to prevent Elephant flows from adversely affecting latency of small "Mice" flows.

SDN and WAN optimization describes a presentation by Amin Vahdat describing Google's SDN based wide area network traffic engineering solution in which traffic prioritization allows Google to reduce costs by fully utilizing WAN bandwidth.

Deconstructing Datacenter Packet Transport describes how priority marking of packets associated with large flows can improve completion times for flows crossing the data center fabric. Simulation results presented in the paper show that prioritization of short flows over large flows can significantly improve throughput (reducing flow completion times by a factor of 5 or more at high loads).

This article demonstrates a self contained real-time Elephant flow marking solution that leverages the real-time visibility and control features available using commodity switch hardware.

The diagram shows the elements of the solution. An instance of the sFlow-RT real-time analytics engine receives streaming sFlow telemetry from a pair of edge routers. A mix of many small flows mixed Continue reading

Real-time network and system metrics as a service

The sFlow-RT real-time analytics engine receives industry standard sFlow telemetry as a continuous stream from network and host devices and coverts the raw data into useful measurements that can be be queried through a REST API. A single sFlow-RT instance can monitor the entire data center, providing a comprehensive view of performance, not just of the individual components, but of the data center as a whole.

This article is an interactive tutorial intended to familiarize the reader with the REST API. The examples can be run on a laptop using recorded data so that access to a live network is not required.

The data was captured from the leaf and spine test network shown above (described in Fabric View).
curl -O https://raw.githubusercontent.com/sflow-rt/fabric-view/master/demo/ecmp.pcap
First, download the captured sFlow data.

You will need to have a system with Java or Docker to run the sFlow-RT software.
curl -O https://inmon.com/products/sFlow-RT/sflow-rt.tar.gz
tar -xzf sflow-rt.tar.gz
./sflow-rt/get-app.sh sflow-rt browse-metrics
./sflow-rt/get-app.sh sflow-rt browse-flows
./sflow-rt/get-app.sh sflow-rt prometheus
./sflow-rt/start.sh -Dsflow.file=$PWD/ecmp.pcap
The above commands download and run sFlow-RT, with browse-metrics, browse-flows, and prometheus applications on a system with Java 1.8+ installed.
docker  Continue reading

NVIDIA, Mellanox, and Cumulus

Recent press releases, Riding a Cloud: NVIDIA Acquires Network-Software Trailblazer Cumulus and NVIDIA Completes Acquisition of Mellanox, Creating Major Force Driving Next-Gen Data Centers, describe NVIDIA's moves to provide high speed data center networks to connect compute clusters that use of their GPUs to accelerate big data workloads, including: deep learning, climate modeling, animation, data visualization, physics, molecular dynamics etc.

Real-time visibility into compute, network, and GPU infrastructure is required manage and optimize the unified infrastructure. This article explores how the industry standard sFlow technology supported by all three vendors can deliver comprehensive visibility.

Cumulus Linux simplifies operations, providing the same operating system, Linux, that runs on the servers. Cumulus Networks and Mellanox have a long history of working with the Linux community to integrate support for switches. The latest Linux kernels now include native support for network ASICs, seamlessly integrating with standard Linux routing (FRR, Quagga, Bird, etc), configuration (Puppet, Chef, Ansible, etc) and monitoring (collectd, netstat, top, etc) tools.

Linux 4.11 kernel extends packet sampling support describes enhancements to the Linux kernel to support industry standard sFlow instrumentation in network ASICs. Cumulus Linux and Mellanox both support the new Linux APIs. Cumulus Linux uses the open source Continue reading

Monitoring DDoS mitigation

Real-time DDoS mitigation using BGP RTBH and FlowSpec and Pushing BGP Flowspec rules to multiple routers describe how to deploy the ddos-protect application. This article focuses on how to monitor DDoS activity and control actions.

The diagram shows the elements of the solution. Routers stream standard sFlow telemetry to an instance of the sFlow-RT real-time analytics engine running the ddos-protect application. The instant a DDoS attack is detected, RTBH and / or Flowspec actions are pushed via BGP to the routers to mitigate the attack. Key metrics are published using the Prometheus exporter format over HTTP and events are sent using the standard syslog protocol.
The sFlow-RT DDoS Protect dashboard, shown above, makes use of the Prometheus time series database and the Grafana metrics visualization tool to track DDoS attack mitigation actions.
The sFlow-RT Countries and Networks dashboard, shown above, breaks down traffic by origin network and country to provide an indication of the source of attacks.  Flow metrics with Prometheus and Grafana describes how to build additional dashboards to provide additional insight into network traffic.
In this example, syslog events are directed to an Elasticsearch, Logstash, and Kibana (ELK) stack where they are archived, queried, and analyzed. Grafana Continue reading

Pushing BGP Flowspec rules to multiple routers

Real-time DDoS mitigation using BGP RTBH and Flowspec describes the open source DDoS Protect application. The software runs on the sFlow-RT real-time analytics engine, which receives industry standard sFlow telemetry from routers and pushes controls using BGP. A recent enhancement to the application pushes controls to multiple routers in order to protect networks with redundant edge routers.
ddos_protect.router=10.0.0.96,10.0.0.97
Configuring multiple BGP connections is simple, the ddos_protect.router configuration option has been extended to accept a comma separated list of IP addresses for the routers that will be connecting to the controller.
Alternatively, a BGP Flowspec/RTBH reflector can be used to propagate the controls. Flowspec is a recent addition to open source BGP software, FRR and Bird, and it should be possible to use this software to reflect Flowspec controls. A reflector can be a useful place to implement policies that direct controls to specific enforcement devices.

Support for multiple BGP connections in the DDoS Protect application reduces the complexity of simple deployments by removing the requirement for a reflector. Controls are pushed to all devices, but differentiated policies can still be implemented by configuring each device's response to controls.

Kubernetes testbed

The sFlow-RT real-time analytics platform receives a continuous telemetry stream from sFlow Agents embedded in network devices, hosts and applications and converts the raw measurements into actionable metrics, accessible through open APIs, see Writing Applications.

Application development is greatly simplified if you can emulate the infrastructure you want to monitor on your development machine. Docker testbed describes a simple way to develop sFlow based visibility solutions. This article describes how to build a Kubernetes testbed to develop and test configurations before deploying solutions into production.
Docker Desktop provides a convenient way to set up a single node Kubernetes cluster, just select the Enable Kubernetes setting and click on Apply & Restart.

Create the following sflow-rt.yml file:
apiVersion: v1
kind: Service
metadata:
name: sflow-rt-sflow
spec:
type: NodePort
selector:
name: sflow-rt
ports:
- protocol: UDP
port: 6343
---
apiVersion: v1
kind: Service
metadata:
name: sflow-rt-rest
spec:
type: LoadBalancer
selector:
name: sflow-rt
ports:
- protocol: TCP
port: 8008
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sflow-rt
spec:
replicas: 1
selector:
matchLabels:
name: sflow-rt
template:
metadata:
labels:
name: sflow-rt
spec:
containers:
- name: sflow-rt
image: sflow/prometheus:latest
ports:
- name: http
protocol: TCP
containerPort: 8008
- name: sflow
protocol: UDP
containerPort: 6343
Run the Continue reading

SFMIX San Francisco shelter in place

A shelter in place order restricted San Francisco residents to their homes beginning at 12:01 a.m. on March 17, 2020. Many residents work for Bay Area technology companies such as Salesforce, Facebook, Twitter, Google, Netflix and Apple. Employees from these companies are able to, and have been instructed to, work remotely from their homes. In addition, other housebound residents are making use of social networking to keep in touch with friends and family as well as streaming media and online gaming for entertainment.

The traffic trend chart above from the San Francisco Metropolitan Internet Exchange (SFMIX) shows the change in network traffic that has resulted from the shelter in place order. Peak traffic has increased by around 10Gbit/s (a 25% increase) and continues throughout the day (whereas peaks previously occurred in the evenings).

The SFMIX network directly connects a number of data centers in the Bay Area and the member organizations that peer from those data centers.  Peering through the exchange network keeps traffic local by directly connecting companies with their employees and customers and avoiding potentially congested service provider networks.
SFMIX recently finished a network upgrade to 100Gbit/s Arista switches and all fiber Continue reading

Ubuntu 18.04

Ubuntu 18.04 comes with Linux kernel version 4.15. This version of the kernel includes efficient in-kernel packet sampling that can be used to provide network visibility for production servers running network heavy workloads, see Berkeley Packet Filter (BPF).
This article provides instructions for installing and configuring the open source Host sFlow agent to remotely monitor servers using the industry standard sFlow protocol. The sFlow-RT real-time analyzer is used to demonstrate the capabilities of sFlow telemetry.

Find the latest Host sFlow version on the Host sFlow download page.
wget https://github.com/sflow/host-sflow/releases/download/v2.0.25-3/hsflowd-ubuntu18_2.0.25-3_amd64.deb
sudo dpkg -i hsflowd-ubuntu18_2.0.25-3_amd64.deb
sudo systemctl enable hsflowd
The above commands download and install the software.
sflow {
collector { ip=10.0.0.30 }
pcap { speed=1G-1T }
tcp { }
systemd { }
}
Edit the /etc/hsflowd.conf file. The above example sends sFlow to a collector at 10.0.0.30, enables packet sampling on all network adapters, adds TCP performance information, and exports metrics for Linux services. See Configuring Host sFlow for Linux for the complete set of configuration options.
sudo systemctl restart hsflowd
Restart the Host sFlow daemon to start streaming telemetry to Continue reading

Docker testbed

The sFlow-RT real-time analytics platform receives a continuous telemetry stream from sFlow Agents embedded in network devices, hosts and applications and converts the raw measurements into actionable metrics, accessible through open APIs, see Writing Applications.

Application development is greatly simplified if you can emulate the infrastructure you want to monitor on your development machine. Mininet flow analyticsMininet dashboard, and Mininet weathermap describe how to use the open source Mininet network emulator to simulate networks and generate a live stream of standard sFlow telemetry data.

This article describes how to use Docker containers as a development platform. Docker Desktop provides a convenient method of running Docker on Mac and Windows desktops. These instructions assume you have already installed Docker.

Start a Host sFlow agent using the pre-built sflow/host-sflow image:
docker run --rm -d -e "COLLECTOR=host.docker.internal" -e "SAMPLING=10" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow
Note: Host, Docker, Swarm and Kubernetes monitoring describes how to deploy Host sFlow agents to monitor large scale container environments.

Start an iperf3 server using the pre-built sflow/iperf3 image:
docker run --rm -d -p 5201:5201 --name iperf3 sflow/iperf3 -s
In a separate terminal window, run the following command to start sFlow-RT:
 Continue reading

CentOS 8

CentOS 8 / RHEL 8 come with Linux kernel version 4.18. This version of the kernel includes efficient in-kernel packet sampling that can be used to provide network visibility for production servers running network heavy workloads, see Berkeley Packet Filter (BPF).
This article provides instructions for installing and configuring the open source Host sFlow agent to remotely monitor servers using the industry standard sFlow protocol. The sFlow-RT real-time analyzer is used to demonstrate the capabilities of sFlow telemetry.

Find the latest Host sFlow version on the Host sFlow download page.
wget https://github.com/sflow/host-sflow/releases/download/v2.0.26-3/hsflowd-centos8-2.0.26-3.x86_64.rpm
sudo rpm -i hsflowd-centos8-2.0.26-3.x86_64.rpm
sudo systemctl enable hsflowd
The above commands download and install the software.
sflow {
collector { ip=10.0.0.30 }
pcap { speed=1G-1T }
tcp { }
systemd { }
}
Edit the /etc/hsflowd.conf file. The above example sends sFlow to a collector at 10.0.0.30, enables packet sampling on all network adapters, adds TCP performance information, and exports metrics for Linux services. See Configuring Host sFlow for Linux for the complete set of configuration options.
sudo systemctl restart hsflowd
Restart the Host sFlow daemon to Continue reading

SONiC

SONiC is part of the Open Compute Project (OCP), creating "an open source network operating system based on Linux that runs on switches from multiple vendors and ASICs." The latest SONiC.201911 release of the open source SONiC network operating system adds sFlow support.
SONiC: sFlow High Level Design
The diagram shows the elements of the implementation.
  1. The open source Host sFlow agent running in the sFlow container monitors the Redis database (in the Database container) for sFlow related configuration changes.
  2. The syncd container monitors the configuration database and pushes hardware settings (packet sampling) to the ASIC using the SAI (Switch Abstraction Inteface) driver (see SAI 1.5).
  3. The ASIC driver hands sampled packet headers and associated metadata captured by the ASIC to user space via the Linux PSAMPLE netlink channel (see Linux 4.11 kernel extends packet sampling support).
  4. The Host sFlow agent receives the PSAMPLE messages and forwards them to configured sFlow collector(s) as standard sFlow packet samples.
  5. In addition, the Host sFlow agent streams telemetry (interface counters and host metrics gathered from the Redis database and Linux kernel) to the collector(s) as standard sFlow counter records.
The following CLI commands enable sFlow Continue reading

Real-time DDoS mitigation using BGP RTBH and FlowSpec

DDoS Protect is a recently released open source application running on the sFlow-RT real-time analytics engine. The software uses streaming analytics to rapidly detect and characterize DDoS flood attacks and automatically applies BGP remote triggered black hole (RTBH) and/or FlowSpec controls to mitigate their impact. The total time to detect and mitigate an attack is in the order of a second.

The combination of multi-vendor standard telemetry (sFlow) and control (BGP FlowSpec) provide the real-time visibility and control needed to quickly and automatically adapt the network to address a range of challenging problems, including: DDoS, traffic engineering, and security.

Solutions are deployable today: Arista BGP FlowSpec describes the recent addition of BGP FlowSpec support to Arista EOS (EOS has long supported sFlow), and sFlow available on Juniper MX series routers describes the release of sFlow support on Juniper MX routers (which have long had BGP FlowSpec support). This article demonstrates DDoS mitigation using Arista EOS. Similar configurations should work with any router that supports sFlow and BGP FlowSpec.
The diagram shows a typical deployment scenario in which an instance of sFlow-RT (running the DDoS Protect application) receives sFlow from the site router (ce-router). A  Continue reading

SAI 1.5

The Open Compute Project (OCP), "is a rapidly growing community of engineers around the world whose mission is to design and enable the delivery of the most efficient server, storage and data center hardware designs available for scalable computing."

The OCP SAI (Switch Abstraction Interface) Project is an important part of the networking effort, defining "a vendor-independent way of controlling forwarding elements, such as a switching ASIC, an NPU or a software switch in a uniform manner." SAI 1.5 Release Notes describe enhancements to existing sFlow API, in particular adding support for the Linux psample netlink channel, see  Linux 4.11 kernel extends packet sampling support. Supporting the standard Linux interface for packet sampling simplifies the implementation of sFlow agents (e.g. Host sFlow) and ensures consistent behavior across hardware platforms to deliver real-time network-wide visibility using industry standard sFlow protocol.

Real-time monitoring at terabit speeds

The Flow Trend chart above shows a real-time, up to the second, view of nearly 3 terabits per second of traffic flowing across the SCinet network, described as the fastest, most powerful volunteer-built network in the world. The network is build each year to support The International Conference for High Performance Computing, Networking, Storage, and Analysis. The SC19 conference is currently underway in Denver, Colorado.
The diagram shows the Joint Big Data Testbed generating the traffic in the chart. The Caltech demonstration is described in NRE-19: SC19 Network Research Exhibition: Caltech Booth 543 Demonstrations Hosting NRE-13, NRE-19, NRE-20, NRE-22, NRE-23, NRE-24, NRE-35:
400GE First Data Networks: Caltech, Starlight/NRL, USC, SCinet/XNET, Ciena, Mellanox, Arista, Dell, 2CRSI, Echostreams, DDN and Pavilion Data, as well as other supporting optical, switch and server vendor partners will demonstrate the first fully functional 3 X400GE local ring network as well as 400GE wide area network ring, linking the Starlight and Caltech booths and Starlight in Chicago. This network will integrate storage using NVMe over Fabric, the latest high throughput methods, in-depth monitoring and realtime flow steering. As part of these demonstrations, we will make use of the latest DWDM, Waveserver Ai, and 400GE as Continue reading
1 3 4 5 6 7 14