Peter

Author Archives: Peter

Configuring OpenSwitch

The following configuration enables sFlow monitoring of all interfaces on a white box switch running the OpenSwitch operating system, sampling packets at 1-in-4096, polling counters every 20 seconds and sending the sFlow to an analyzer (10.0.0.50) on UDP port 6343 (the default sFlow port):
switch(config)# sflow collector 10.0.0.50
switch(config)# sflow sampling 4096
switch(config)# sflow polling 20
switch(config)# sflow enable
A previous posting discussed the selection of sampling rates.  Additional information can be found in the OpenSwitch sFlow User Guide.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Cisco Tetration analytics

Cisco Tetration Analytics: the most Comprehensive Data Center Visibility and Analysis in Real Time, at Scale, June 15, 2016, announced the new Cisco Tetration Analytics platform. The platform collects telemetry from proprietary agents on servers and embedded in hardware on certain Nexus 9k switches, analyzes the data, and presents results via Web GUI, REST API, and as events.

Cisco Tetration Analytics Data Sheet describes the hardware requirements:
Platform Hardware
Quantity
Cisco Tetration Analytics computing nodes (servers)
16
Cisco Tetration Analytics base nodes (servers)
12
Cisco Tetration Analytics serving nodes (servers)
8
Cisco Nexus 9372PX Switches
3

And the power requirements:
Property
Cisco Tetration Analytics Platform
Peak power for Cisco Tetration Analytics Platform (39-RU single-rack option)
22.5 kW
Peak power for Cisco Tetration Analytics Platform (39-RU dual-rack option)
11.25 kW per rack (22.5 KW Total)

No pricing is given, but based on the hardware, data center space, power and cooling requirements, this brute force approach to analytics will be reassuringly expensive to purchase and operate.

Update June 22, 2016: See 451 Research report, Cisco Tetration: a $3m, 1,700-pound appliance for network traffic analytics is born, for pricing information.
A much less expensive alternative is to use industry Continue reading

Programmable hardware: Barefoot Networks, PISA, and P4

Barefoot Networks recently came out of stealth to reveal their  Tofino 6.5Tbit/second (65 X 100GE or 260 X 25GE) fully user-programmable switch. The diagram above, from the talk Programming The Network Data Plane by Changhoon Kim of Barefoot Networks, shows the Protocol Independent Switch Architecture (PISA) of the programmable switch silicon.
A logical switch data-plane described in the P4 language is compiled to program the general purpose PISA hardware. For example, the following P4 code snippet is part of a P4 sFlow implementation:
table sflow_ing_take_sample {
/* take_sample > MAX_VAL_31 and valid sflow_session_id => take the sample */
reads {
ingress_metadata.sflow_take_sample : ternary;
sflow_metadata.sflow_session_id : exact;
}
actions {
nop;
sflow_ing_pkt_to_cpu;
}
}
Network visibility is one of the major use cases for P4 based switches. Improving Network Monitoring and Management with Programmable Data Planes describes how P4 can be used to collect information about latency and queueing in the switch forwarding pipeline.
The document also describes an architecture for In-band Network Telemetry (INT) in which the ingress switch is programmed to insert a header containing measurements to packets entering the network. Each switch in the path is programmed to append additional measurements to the packet header. The Continue reading

Merchant silicon based routing, flow analytics, and telemetry

Drivers for growth describes how switches built on merchant silicon from Broadcom ASICs dominate the current generation of data center switches, reduce hardware costs, and support an open ecosystem of switch operating systems (Cumulus Linux, OpenSwitch, Dell OS10, Broadcom FASTPATH, Pica8 PicOS, Open Network Linux, etc.).

The router market is poised to be similarly disrupted with the introduction of devices based on Broadcom's Jericho ASIC, which has the capacity to handle over 1 million routes in hardware (the full Internet routing table is currently around 600,000 routes).
An edge router is a very pricey box indeed, often costing anywhere from $100,000 to $200,000 per 100 Gb/sec port, depending on features in the router and not including optical cables that are also terribly expensive. Moreover, these routers might only be able to cram 80 ports into a half rack or full rack of space. The 7500R universal spine and 7280R universal leaf switches cost on the order of $3,000 per 100 Gb/sec port, and they are considerably denser and less expensive. - Leaving Fixed Function Switches Behind For Universal Leafs
Broadcom Jericho ASICs are currently available in Arista 7500R/7280R routers and in Cisco NCS 5000 series routers. Expect further disruption Continue reading

Docker networking with IPVLAN and Cumulus Linux

Macvlan and Ipvlan Network Drivers are being added as Docker networking options. The IPVlan L3 Mode shown in the diagram is particularly interesting since it dramatically simplifies the network by extending routing to the hosts and eliminating switching entirely.

Eliminating the complexity associated with switching broadcast domains, VLANs, spanning tree, etc. allows a purely routed network to be easily scaled to very large sizes. However, there are some challenges to overcome:
IPVlan will require routes to be distributed to each endpoint. The driver only builds the Ipvlan L3 mode port and attaches the container to the interface. Route distribution throughout a cluster is beyond the initial implementation of this single host scoped driver. In L3 mode, the Docker host is very similar to a router starting new networks in the container. They are on networks that the upstream network will not know about without route distribution.
Cumulus Networks has been working to simplify routing in the ECMP leaf and spine networks and the white paper Routing on the Host: An Introduction shows how the routing configuration used on Cumulus Linux can be extended to the hosts.

Update June 2, 2016: Routing on the Host contains packaged versions of the Continue reading

Streaming telemetry

The OpenConfig project has been getting a lot of attention lately.  A number of large network operators, lead by Google, are developing "a consistent set of vendor-neutral data models (written in YANG) based on actual operational needs from use cases and requirements from multiple network operators."

The OpenConfig project extends beyond configuration, "Streaming telemetry is a new paradigm for network monitoring in which data is streamed from devices continuously with efficient, incremental updates. Operators can subscribe to the specific data items they need, using OpenConfig data models as the common interface."

Anees Shaikh's Network Field Day talk provides an overview of OpenConfig and includes an example that demonstrates how configuration and state are combined in a single YANG data model. In the example, read/write config attributes used to configure a network interface (name, description, MTU, operational state) are combined with the state attributes needed to verify the configuration (MTU, name, description, oper-status, last-change) and collect metrics (in-octets, in-ucast-pkts, in-broadcast-pkts, ...).

Anees positions OpenConfig streaming telemetry mechanism as an attractive alternative to polling for metrics using Simple Network Management Protocol (SNMP) - see Push vs Pull for a detailed comparison between pushing (streaming) and pulling (polling) metrics.

Streaming telemetry is Continue reading

Internet of Things (IoT) telemetry

The internet of things (IoT) is the network of physical objects—devices, vehicles, buildings and other items—embedded with electronics, software, sensors, and network connectivity that enables these objects to collect and exchange data. - ITU

The recently released Raspberry Pi Zero (costing $5) is an example of the type of embedded low power computer enabling IoT. These small devices are typically wired to one or more sensors (measuring temperature, humidity, location, acceleration, etc.) and embedded in or attached to physical devices.

Collecting real-time telemetry from large numbers of small devices that may be located within many widely dispersed administrative domains poses a number of challenges, for example:
  • Discovery - How are newly connected devices discovered?
  • Configuration - How can the numerous individual devices be efficiently configured?
  • Transport - How efficiently are measurements transported and delivered?
  • Latency - How long does it take before measurements are remotely accessible? 
This article will use the Raspberry Pi as an example to explore how the architecture of the industry standard sFlow protocol and its implementation in the open source Host sFlow agent provide a method of addressing the challenges of embedded device monitoring.

The following steps describe how to install the Host sFlow Continue reading

OVS Orbit podcast with Ben Pfaff

OVS Orbit Episode 6 is a wide ranging discussion between Ben Pfaff and Peter Phaal of the industry standard sFlow measurement protocol, implementation of sFlow in Open vSwitch, network analytics use cases and application areas supported by sFlow, including: OpenStack, Open Network Virtualization (OVN), DDoS mitigation, ECMP load balancing, Elephant and Mice flows, Docker containers, Network Function Virtualization (NFV), and microservices.

Follow the link to see listen to the podcast, read the extensive show notes, follow related links, and to subscribe to the podcast.

Raspberry Pi real-time network analytics

The Raspberry Pi model 3b is not much bigger than a credit card, costs $35, runs Linux, has a 1G RAM, and powerful 4 core 64 bit ARM processor. This article will demonstrate how to turn the Raspberry Pi into a Terribit/second real-time network analytics engine capable of monitoring hundreds of switches and thousands of switch ports.
The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from industry standard sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.
A future article will examine how the Host sFlow agent can be used to efficiently stream measurements from large numbers of inexpensive Rasberry Pi devices ($5 for model Zero) to the sFlow-RT collector to monitor and control the "Internet of Things" (IoT).
The following instructions show how to install sFlow-RT on Raspbian Jesse (the Debian Linux based Raspberry Pi operating system).
wget http://www.inmon.com/products/sFlow-RT/sflow-rt_2.0-1092.deb
sudo dpkg -i --ignore-depends=openjdk-7-jre-headless sflow-rt_2.0-1092.deb
We are ignoring the dependency on openjdk and will use the default Raspbian Java 1.8 version Continue reading

OpenNSL

Open Network Switch Layer (OpenNSL) is a library of network switch APIs that is openly available for programming Broadcom network switch silicon based platforms. These open APIs enable development of networking application software based on Broadcom network switch architecture based platforms.

The recent inclusion of the APIs needed to enable sFlow instrumentation in Broadcom hardware allows open source network operating systems such as OpenSwitch and Open Network Linux to implement the sFlow telemetry standard.

Mininet dashboard

Mininet Dashboard has been released on GitHub, https://github.com/sflow-rt/mininet-dashboard. Follow the steps in Mininet flow analytics to install sFlow-RT and configure sFlow instrumentation in Mininet.

The following steps install the dashboard and start sFlow-RT:
cd sflow-rt
./get-app.sh sflow-rt mininet-dashboard
./start.sh
The dashboard web interface shown in the screen shot should now be accessible. Run a test to see data in the dashboard. The following test created the results shown:
sudo mn --custom extras/sflow.py --link tc,bw=10 --topo tree,depth=2,fanout=2 --test iperf
The dashboard has three time series charts that update every second and show five minutes worth of data. From top to bottom, the charts are:
  1. Top Flows - Click on a peak in the chart to see the flows that were active at that time.
  2. Top Ports - Click on a peak in the chart to see the ingress ports that were active at that time.
  3. Topology Diameter - The diameter of the topology.
The dashboard application is easily modified to add additional metrics, generate events, or implement controls. For example, adding the following code to the end of the sflow-rt/app/mininet-dashboard/scripts/metrics.js file implements equivalent functionality to the large flow detection Python script described in Mininet flow analytics Continue reading

Mininet flow analytics

Mininet is free software that creates a realistic virtual network, running real kernel, switch and application code, on a single machine (VM, cloud or native), in seconds. Mininet is useful for development, teaching, and research. Mininet is also a great way to develop, share, and experiment with OpenFlow and Software-Defined Networking systems.

This article shows how standard sFlow instrumentation built into Mininet can be combined with sFlow-RT analytics software to provide real-time traffic visibility for Mininet networks. Augmenting Mininet with sFlow telemetry realistically emulates the instrumentation built into most vendor's switch hardware, provides visibility into Mininet experiments, and opens up new areas of research (e.g. SDN and large flows).

The following papers are a small selection of projects using sFlow-RT:
In order Continue reading

Identifying bad ECMP paths

In the talk Move Fast, Unbreak Things! at the recent DevOps Networking Forum,  Petr Lapukhov described how Facebook has tackled the problem of detecting packet loss in Equal Cost Multi-Path (ECMP) networks. At Facebook's scale,  there are many parallel paths and actively probing all the paths generates a lot of data. The active tests generate over 1Terabits/second of measurement data per Facebook data center and a Hadoop cluster with hundreds of compute nodes is required per data center to process the data.

Processing active test data can detect that packets are being lost within approximately 20 seconds, but doesn't provide the precise location where packets are dropped. A custom multi-path traceroute tool (fbtracert) is used to follow up and narrow down the location of the packet loss.

While described as measuring packet loss, the test system is really measuring path loss. For example, if there are 64 ECMP paths in a pod, then the loss of one path would result in a packet loss of approximately 1 in 64 packets in traffic flows that cross the ECMP group.

Black hole detection describes an alternative approach. Industry standard sFlow instrumentation embedded within most vendor's switch hardware provides visibility into the Continue reading

Black hole detection

The Broadcom white paper, Black Hole Detection by BroadView™ Instrumentation Software, describes the challenge of detecting and isolating packet loss caused by inconsistent routing in leaf-spine fabrics. The diagram from the paper provides an example, packets from host H11 to H22 are being forwarded by ToR1 via Spine1 to ToR2 even though the route to H22 has been withdrawn from ToR2. Since ToR2 doesn't have a route to the host, it sends the packet back up to Spine 2, which will send the packet back to ToR2, causing the packet to bounce back and forth until the IP time to live (TTL) expires.

The white paper discusses how Broadcom ASICs can be programmed to detect blackholes based on packet paths, i.e. packets arriving at a ToR switch from a Spine switch should never be forwarded to another Spine switch.

This article will discuss how the industry standard sFlow instrumentation (also included in Broadcom based switches) can be used to provide fabric wide detection of black holes.

The diagram shows a simple test network built using Cumulus VX virtual machines to emulate a four switch leaf-spine fabric like the one described in the Broadcom white paper (this network is Continue reading

sFlow to IPFIX/NetFlow

RESTflow explains how the sFlow architecture shifts the flow cache from devices to external software and describes how the sFlow-RT REST API can be used to program and query flow caches. Exporting events using syslog describes how flow records can be exported using the syslog protocol to Security Information and Event Management (SIEM) tools such as Logstash and and Splunk. This article demonstrates how sFlow-RT can be used to define and export the flows using the IP Flow Information eXport (IPFIX) protocol (the IETF standard based on NetFlow version 9).

For example, the following command defines a cache that will maintain flow records for TCP flows on the network, capturing IP source and destination addresses, source and destination port numbers and the bytes transferred and sending flow records to address 10.0.0.162:
curl -H "Content-Type:application/json" -X PUT --data  '{"keys":"ipsource,ipdestination,tcpsourceport,tcpdestinationport", 
"value":"bytes", "ipfixCollectors":["10.0.0.162"]}'
http://localhost:8008/flow/tcp/json
Running Wireshark's tshark command line utility on 10.0.0.162 verifies that flows are being received:
# tshark -i eth0 -V udp port 4739
Running as user "root" and group "root". This could be dangerous.
Capturing on lo
Frame 1 (134 bytes on wire, 134 bytes captured)
Arrival Time: Continue reading

Berkeley Packet Filter (BPF)

Linux bridge, macvlan, ipvlan, adapters discusses how industry standard sFlow technology, widely supported by data center switch vendors, has been extended to provide network visibility into the Linux data plane. This article explores how sFlow's lightweight packet sampling mechanism has been implemented on Linux network adapters.

Linux Socket Filtering aka Berkeley Packet Filter (BPF) describes the recently added prandom_u32() function that allows packets to be randomly sampled in the Linux kernel for efficient monitoring of production traffic.
Background: Enhancing Network Intrusion Detection With Integrated Sampling and Filtering, Jose M. Gonzalez and Vern Paxson, International Computer Science Institute Berkeley, discusses the motivation for adding random sampling BPF and the email thread [PATCH] filter: added BPF random opcode describes the Linux implementation and includes an interesting discussion of the motivation for the patch.
The following code shows how the open source Host sFlow agent implements random 1-in-256 packet sampling as a BPF program:
ld rand
mod #256
jneq #1, drop
ret #-1
drop: ret #0
A JIT for packet filters discusses the Linux Just In Time (JIT) compiler for BFP programs, delivering native machine code performance for compiled filters.

Minimizing cost of visibility describes why low overhead monitoring is an Continue reading

Cisco SF250, SG250, SF350, SG350, SG350XG, and SG550XG series switches

Software version 2.1.0 adds sFlow support to Cisco 250 Series Smart Switches, 350 Series Smart Switches and 550X Series Stackable Managed Switches.
Cisco network engineers might not be familiar with the multi-vendor sFlow technology since it is a relatively new addition to Cisco products. The article, Cisco adds sFlow support, describes some of the key features of sFlow and contrasts them to Cisco NetFlow.
Configuring sFlow on the switches is straightforward. For example, The following commands configure a switch to sample packets at 1-in-1024, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) over UDP using the default sFlow port (6343):
sflow receiver 1 10.0.0.50
For each interface:
sflow flow-sampling 1024 1
sflow counter-sampling 30 1
A previous posting discussed the selection of sampling rates. Additional information can be found on the Cisco web site.

Trying out sFlow offers suggestions for getting started with sFlow monitoring and reporting. The article recommends the sFlowTrend analyzer as a way to get started since it is a free, purpose built sFlow analyzer that delivers the full capabilities of sFlow instrumentation in the Cisco switches.

Multi-tenant sFlow

This article discusses how real-time sFlow telemetry can be shared with network tenants to provide each tenant with a real-time view of their slice of the shared resources. The diagram shows a simple network with two tenants, Tenant A and Tenant B, each assigned their own subnet, 10.0.0.0/24 and 10.0.1.0/24 respectively.

One option would be to simply replicate the sFlow datagrams and send copies to both tenants. Forwarding using sflowtool describes how sflowtool can be used to replicate and forward sFlow and sFlow-RT can be configured to forward sFlow using its REST API:
curl -H "Content-Type:application/json" \
-X PUT --data '{"address":"10.0.0.1","port":6343}' \
http://127.0.0.1:8008/forwarding/TenantA/json
However, there are serious problems with this approach:
  1. Private information about Tenant B's traffic is leaked to Tenant A.
  2. Information from internal links within the network (i.e. links between s1, s2, s3 and s4) is leaked to Tenant A.
  3. Duplicate data from each network hop is likely to cause Tenant A to over-estimate their traffic.
The sFlow-RT multi-tenant forwarding function addresses these challenges. The first task is to provide sFlow-RT with an accurate network topology specifying the internal links connecting the Continue reading

Network visibility with Docker

Microservices describes the critical role that network visibility provides as a common point of reference for monitoring, managing and securing the interactions between the numerous and diverse distributed service instances in a microservices deployment.

Industry standard sFlow is well placed to give network visibility into the Docker infrastructure used to support microservices. The sFlow standard is widely supported by data center switch vendors (Cisco, Arista, Juniper, Dell, HPE, Brocade, Cumulus, etc.)  providing a cost effective and scaleable method of monitoring the physical network infrastructure. In addition, Linux bridge, macvlan, ipvlan, adapters described how sFlow is also an efficient means of leveraging instrumentation built into the Linux kernel to extend visibility into Docker host networking.

The following commands build the Host sFlow binary package from sources on an Ubuntu 14.04 system:
sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install libpcap-dev
sudo apt-get install wget
wget https://github.com/sflow/host-sflow/archive/v1.29.1.tar.gz
tar -xvzf v1.29.1.tar.gz
cd host-sflow-1.29.1
make DOCKER=yes PCAP=yes deb
This resulting hsflowd_1.29.1-1_amd64.deb package can be copied and installed on all the hosts in the Docker cluster using configuration management tools such as Puppet, Chef, Ansible, etc.

This Continue reading

Lasers!

Cool! You mean that I actually have frickin' switches with frickin' laser beams attached to their frickin' ports?

Dr. Evil is right, lasers are cool! The draft sFlow Optical Interface Structures specification exports metrics gathered from instrumentation built into Small Form-factor Pluggable (SFP) and Quad Small Form-factor Pluggable (QSFP) optics modules. This article provides some background on optical modules and discusses the value of including optical metrics in the sFlow telemetry stream exported from switches and hosts.
Pluggable optical modules are intelligent devices that do more than simply convert between optical and electrical signals. The functional diagram below shows the elements within a pluggable optical module.
The transmit and receive functions are shown in the upper half of the diagram. Incoming optical signals are received on a fiber and amplified as they are converted to electrical signals that can be handled by the switch. Transmit data drives the modulation of a laser diode which transmits the optical signal down a fiber.

The bottom half of the diagram shows the management components. Power, voltage and temperature sensors are monitored and the results are written into registers in an EEPROM that are accessible via a management interface.

The proposed sFlow extension standardizes Continue reading
1 7 8 9 10 11 13