Peter

Author Archives: Peter

Systemd traffic marking

Monitoring Linux services describes how the open source Host sFlow agent exports metrics from services launched using systemd, the default service manager on most recent Linux distributions. In addition, the Host sFlow agent efficiently samples network traffic using Linux kernel capabilities: PCAP/BPF, nflog, and ulog.

This article describes a recent extension to the Host sFlow systemd module, mapping sampled traffic to the individual services the generate or consume them. The ability to color traffic by application greatly simplifies service discovery and service dependency mapping; making it easy to see how services communicate in a multi-tier application architecture.

The following /etc/hsflowd.conf file configures the Host sFlow agent, hsflowd, to sampling packets on interface eth0, monitor systemd services and mark the packet samples, and track tcp performance:
sflow {
collector { ip = 10.0.0.70 }
pcap { dev = eth0 }
systemd { markTraffic = on }
tcp { }
}
The diagram above illustrates how the Host sFlow agent is able to efficiently monitor and classify traffic. In this case both the Host sFlow agent and an Apache web server are are running as services managed by systemd. A network connection , shown in Continue reading

Microsoft Office 365

Office 365 IP Address and URL Web service describes a simple REST API that can be used to query for the IP address ranges associated with Microsoft Office 365 servers.

This information is extremely useful, allowing traffic analytics software to combine telemetry obtained from network devices with information obtained using the Microsoft REST API  in order to identifying clients, links, and devices carrying the traffic, as well as any issues, such as link errors, and congestion,  that may be impacting performance.
The sFlow-RT analytics engine is programmable and includes a REST client that can be used to query the Microsoft API and combine the information with industry standard sFlow telemetry from network devices. The following script, office365.js, provides a simple example:
var api = 'https://endpoints.office.com/endpoints/worldwide';

function uuidv4() {
return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function(c) {
var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
return v.toString(16);
});
}

var reqid = uuidv4();

function updateAddressMap() {
var res, i, ips, id, groups;
try { res = http(api+'?clientrequestid='+reqid); }
catch(e) { logWarning('request failed ' + e); }
if(res == null) return;
res = JSON.parse(res);
groups Continue reading

Northbound Networks Zodiac GX

Mininet is widely used to emulate software defined networks (SDNs). Mininet flow analytics describes how standard sFlow telemetry, from Open vSwitch used by Mininet emulate the network, provides feedback to an SDN controller, allowing the controller to adapt the network to changing traffic, for example, to mitigate a distributed denial of service (DDoS) attack.

Northbound Networks Zodiac GX is an inexpensive open source software based switch that is ideal for experimenting with software defined networking (SDN) in a physical network setting. The small fanless package makes the switch an attractive option for desktop use. The Zodiac GX is also based on Open vSwitch, making it easy to take SDN control strategies developed on Mininet.
Enabling sFlow on the Zodiac GX is easy, navigate to the System>Startup page and add the following line to the end of the startup script (before the exit 0 line):
ovs-vsctl -- --id=@sflow create sflow agent=$OVS_BR target=$IP_CONTROLLER_1 sampling=100 polling=10 -- set bridge $OVS_BR sflow=@sflow
Reboot the switch for the changed to take effect.

Use sflowtool to verify that sFlow is arriving at the controller host and to examine the contents of the telemetry stream. Running sflowtool using Docker is a simple alternative to building the software Continue reading

RDMA over Converged Ethernet (RoCE)

RDMA over Converged Ethernet is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. One of the benefits running RDMA over Ethernet is the visibility provided by standard sFlow instrumentation embedded in the commodity Ethernet switches used to build data center leaf and spine networks where RDMA is most prevalent.

The sFlow telemetry stream includes packet headers, sampled at line rate by the switch hardware. Hardware packet sampling allows the switch to monitor traffic at line rate on all ports, keeping up with the high speed data transfers associated with RoCE.

The diagram above shows the packet headers associated with RoCEv1 and RoCEv2 packets. Decoding the InfiniBand Global Routing Header (IB GRH) and InfiniBand Base Transport Header (IB BTH) allows an sFlow analyzer to report in detail on RoCE traffic.
The sFlow-RT real-time analytics engine recently added support for RoCE by decoding InfiniBand Global Routing and InfiniBand Base Transport fields. The screen capture of the sFlow-RT Flow-Trend application shows traffic associated with an RoCEv2 connection between two hosts, 10.10.2.22 and 10.10.2.52. The traffic consists of SEND and ACK messages exchanged as part of a reliable connection (RC Continue reading

ExtremeXOS 22.5.1 adds support Broadcom ASIC table utilization statistics

ExtremeXOS 22.5.1 is now available! describes added support in sFlow for "New data structures to support reporting on hardware/table utilization statistics." The feature is available on Summit X450-G2, X460-G2, X670-G2, X770, and ExtremeSwitching X440-G2, X870, X620, X690 series switches.

Figure 1 shows the packet processing pipeline of a Broadcom ASIC. The pipeline consists of a number of linked hardware tables providing bridging, routing, access control list (ACL), and ECMP forwarding group functions. Operations teams need to be able to proactively monitor table utilizations in order to avoid performance problems associated with table exhaustion.

Broadcom's sFlow specification, sFlow Broadcom Switch ASIC Table Utilization Structures, leverages the industry standard sFlow protocol to offer scaleable, multi-vendor, network wide visibility into the utilization of these hardware tables.

The following output from the open source sflowtool command line utility shows the raw table measurements (this is in addition to the extensive set of measurements already exported via sFlow by ExtremeXOS):
bcm_asic_host_entries 4
bcm_host_entries_max 8192
bcm_ipv4_entries 0
bcm_ipv4_entries_max 0
bcm_ipv6_entries 0
bcm_ipv6_entries_max 0
bcm_ipv4_ipv6_entries 9
bcm_ipv4_ipv6_entries_max 16284
bcm_long_ipv6_entries 3
bcm_long_ipv6_entries_max 256
bcm_total_routes 10
bcm_total_routes_max 32768
bcm_ecmp_nexthops 0
bcm_ecmp_nexthops_max 2016
bcm_mac_entries 3
bcm_mac_entries_max 32768
bcm_ipv4_neighbors 4
bcm_ipv6_neighbors 0
bcm_ipv4_routes 0
bcm_ipv6_routes 0
bcm_acl_ingress_entries Continue reading

Visualizing real-time network traffic flows at scale

Particle has been released on GitHub, https://github.com/sflow-rt/particle. The application is a real-time visualization of network traffic in which particles flow between hosts arranged around the edges of the screen. Particle colors represent different types of traffic.

Particles provide an intuitive representation of network packets transiting the network from source to destination. The animation slows time so that the particle takes 10 seconds (instead of milliseconds) to transit the network. Groups of particles traveling the same path represent flows of packets between the hosts. Particle size and frequency are used to indicate the intensity of the traffic flowing on a path.

Particles don't follow straight lines, instead following quadratic Bézier curves around the center of the screen. Warping particle paths toward the center of the screen ensures that all paths are of similar length and visible - even if the start and end points are on the same axis.

The example above is from a site with over 500 network switches carrying hundreds of Gigabits of traffic. Internet, Customer, Site and Datacenter hosts have been assigned to the North, East, South and West sides respectively.
The screen is updated 60 times per second for smooth animation. Active Continue reading

sFlow available on Juniper PTX series routers


sFlow functionality introduced on the PTX1000 and PTX10000 platforms—Starting in Junos OS Release 18.2R1, the PTX1000 and PTX10000 routers support sFlow, a network monitoring protocol for high-speed networks. With sFlow, you can continuously monitor tens of thousands of ports simultaneously. The mechanism used by sFlow is simple, not resource intensive, and accurate.  - New and Changed Features

The recent article, sFlow available on Juniper MX series routers, describes how Juniper is extending sFlow support to include routers to provide visibility across their entire range of switching and routing products.

Universal support for industry standard sFlow as a base Junos feature reduces the operational complexity and cost of network visibility for enterprises and service providers. Real-time streaming telemetry from campus switches, routers, and data center switches, provides centralized, real-time, end-to-end visibility needed to troubleshoot, optimize, and account for network usage.

Analytics software is a critical factor in realizing the full benefits of sFlow monitoring. Choosing an sFlow analyzer discusses important factors to consider when selecting from the range of open source and commercial sFlow analysis tools.

SDKLT

Logical Table Software Development Kit (SDKLT) is a new, powerful, and feature rich Software Development Kit (SDK) for Broadcom switches. SDKLT provides a new approach to switch configuration using Logical Tables.

Building the Demo App describes how to get started using a simulated Tomahawk device. Included, is a CLI that can be used to explore tables. For example, the following CLI output shows the attributes of the sFlow packet sampling table:
BCMLT.0> lt list -d MIRROR_PORT_ENCAP_SFLOW
MIRROR_PORT_ENCAP_SFLOW
Description: The MIRROR_PORT_ENCAP_SFLOW logical table is used to specify
per-port sFlow encapsulation sample configuration.
11 fields (1 key-type field):
SAMPLE_ING_FLEX_RATE
Description: Sample ingress flex sFlow packet if the generated sFlow random
number is greater than the threshold. A lower threshold leads to
higher sampling frequency.
SAMPLE_EGR_RATE
Description: Sample egress sFlow packet if the generated sFlow random number is
greater than the threshold. A lower threshold leads to
higher sampling frequency.
SAMPLE_ING_RATE
Description: Sample ingress sFlow packet if the generated sFlow random number is
greater than the threshold. A lower threshold leads to
higher sampling frequency.
SAMPLE_ING_FLEX_MIRROR_INSTANCE
Description: Enable to copy ingress flex sFlow packet samples to the ingress
mirror member using the sFlow mirror instance configuration.
SAMPLE_ING_FLEX_CPU
Description: Enable to copy ingress flex Continue reading

sFlow available on Juniper MX series routers

sFlow support on MX Series devices—Starting in Junos OS Release 18.1R1, you can configure sFlow technology (as a sFlow agent) on a MX Series device, to continuously monitor traffic at wire speed on all interfaces simultaneously. The sFlow technology is a monitoring technology for high-speed switched or routed networks.  - New and Changed Features

Understanding How to Use sFlow Technology for Network Monitoring on a MX Series Router lists the following benefits of sFlow Technology on a MX Series Router:
  • sFlow can be used by software tools like a network analyzer to continuously monitor tens of thousands of switch or router ports simultaneously.
  • Since sFlow uses network sampling (forwarding one packet from ‘n’ number of total packets) for analysis, it is not resource intensive (for example processing, memory and more). The sampling is done at the hardware application-specific integrated circuits (ASICs) and hence it is simple and more accurate.
With the addition of the MX series routers, Juniper now supports sFlow across its entire product range:
Universal support for Continue reading

ONOS measurement based control

ONOS traffic analytics describes how to run the ONOS SDN controller with a virtual network created using Mininet. The article also showed how to monitor network traffic using industry standard sFlow instrumentation available in Mininet and in physical switches.
This article uses the same ONOS / Mininet test bed to demonstrate how sFlow-RT real-time flow analytics can be used to push controls to the network through the ONOS REST API.  Leaf and spine traffic engineering using segment routing and SDN used real-time flow analytics to load balance an ONOS controlled physical network. In this example, we will use ONOS to filter DDoS attack traffic on a Mininet virtual network.

The following sFlow-RT script, ddos.js, detects DDoS attacks and programs ONOS filter rules to block the attacks:
var user = 'onos';
var password = 'rocks';
var onos = '192.168.123.1';
var controls = {};

setFlow('udp_reflection',
{keys:'ipdestination,udpsourceport',value:'frames'});
setThreshold('udp_reflection_attack',
{metric:'udp_reflection',value:100,byFlow:true,timeout:2});

setEventHandler(function(evt) {
// don't consider inter-switch links
var link = topologyInterfaceToLink(evt.agent,evt.dataSource);
if(link) return;

// get port information
var port = topologyInterfaceToPort(evt.agent,evt.dataSource);
if(!port) return;

// need OpenFlow info to create ONOS filtering rule
if(!port.dpid || !port.ofport) return;

// we already have Continue reading

ONOS traffic analytics

Open Network Operating System (ONOS) is "a software defined networking (SDN) OS for service providers that has scalability, high availability, high performance, and abstractions to make it easy to create applications and services." The open source project is hosted by the Linux Foundation.

Mininet and onos.py workflow describes how to run ONOS using the Mininet network emulator. Mininet allows virtual networks to be quickly constructed and is a simple way to experiment with ONOS. In addition, Mininet flow analytics describes how to enable industry standard sFlow streaming telemetry in Mininet, proving a simple way monitor traffic in the ONOS controlled network.

For example, the following command creates a Mininet network, controlled by ONOS, and monitored using sFlow:
sudo mn --custom ~/onos/tools/dev/mininet/onos.py,sflow-rt/extras/sflow.py \
--link tc,bw=10 --controller onos,1 --topo tree,2,2
The screen capture above shows the network topology in the ONOS web user interface.
Install Mininet dashboard to visualize the network traffic. The screen capture above shows a large flow over the same topology being displayed by ONOS, see Mininet weathermap for more examples.

In this case, the traffic was created by the following Mininet command:
mininet-onos> iperf h1 h3
The screen capture above shows top flows, busiest Continue reading

Real-time baseline anomaly detection

The screen capture demonstrates the real-time baseline and anomaly detection based on industry standard sFlow streaming telemetry. The chart was generated using sFlow-RT analytics software. The blue line is an up to the second measure of traffic (measured in Bits per Second). The red and gold lines represent dynamic upper and lower limits calculated by the baseline function. The baseline function flags "high" and "low" value anomalies when values move outside the limits. In this case, a "low" value anomaly was flagged for the drop in traffic shown in the chart.

Writing Applications provides a general introduction to sFlow-RT programming. The baseline functionality is exposed through through the JavaScript API.

Create new baseline
baselineCreate(name,window,sensitivity,repeat);
Where:
  • name, name used to reference baseline.
  • window, the number of previous intervals to consider in calculating the limits.
  • sensitivity, the number of standard deviations used to calculate the limits.
  • repeat, the number of successive data points outside the limits before flagging anomaly 
In this example, baseline parameter values were window=180 (seconds), sensitivity=2, and repeat=3.

Update baseline
var status = baselineCheck(name,value);
Where:
  • status, "learning" while baseline is warming up (takes window intervals),  "normal" if value is in expected range, "low" Continue reading

Flow smoothing

The sFlow-RT real-time analytics engine includes statistical smoothing. The chart above illustrates the effect of different levels of smoothing when analyzing real-time sFlow telemetry.

The traffic generator in this example creates an alternating pattern: 1.25Mbytes/second for 30 seconds followed by a pause of 30 seconds. Smoothing time constants between 1 second and 500 seconds have been applied to generate the family of charts. The blue line is the result of 1 second smoothing and closely tracks the traffic pattern. At the other extreme, the dark red line is the result of 500 second smoothing, showing a constant 625Kbytes/second (the average of the waveform).

There is a tradeoff between responsiveness and variability (noise) when selecting the level of smoothing. Selecting a suitable smoothing level depends on the flow analytics application.

Low smoothing values are appropriate when fast response is required, for example:
Higher smoothing values are appropriate when less variability is desirable, for example:

Generating the chart

The results described in this article are easily reproduced using the testbed Continue reading

Mininet weathermap

Mininet dashboard is a real-time dashboard displaying traffic information from Mininet virtual networks. The screen capture demonstrates the real-time network weather map capability that was recently added to the dashboard. The torus topology is displayed and link widths are updated every second to reflect traffic. In this example a large flow between switches s1x1 and s3x3 is routed via s1x3.

The network was created using the following Mininet command:
sudo mn --custom=sflow-rt/extras/sflow.py --link tc,bw=10 \
--topo torus,3,3 --switch ovsbr,stp=1 --test iperf
In the screen capture above you can clearly see the large flow traversing switches, s4, s3, s2, s1, s9, s13, and s15 in a tree topology. The network was created using the following command:
sudo mn --custom sflow-rt/extras/sflow.py --link tc,bw=10 \
--topo tree,depth=4,fanout=2 --test iperf
The screen capture above shows a large flow traversing switches s1, s2, s3, and s4 in a linear topology. The network was created using the following command:
sudo mn --custom sflow-rt/extras/sflow.py --link tc,bw=10 \
--topo linear,4 --test iperf
It's also easy to create Custom Topologies. The following command creates the example custom topology, topo-2sw-2host.py, that ships with Mininet:
sudo mn --custom ~/mininet/custom/topo-2sw-2host.py,sflow-rt/extras/sflow.py  Continue reading

OCP Summit 2018

Network telemetry was a popular topic at the recent OCP U.S. Summit 2018 in San Jose, California, with an entire afternoon track of the two day conference devoted to the subject. Videos of the talks should soon be posted on the conference web site.

The following articles on this blog cover related topics:
In addition, there were a couple of live sFlow telemetry demonstration in the conference exhibit hall.
The first was a demonstration of leaf and spine fabric visibility using white box switches running the open source Linux Foundation OpenSwitch network operating system. OpenSwitch describes how the open source Host sFlow agent enables standard sFlow instrumentation in merchant silicon based white box switches using OpenSwitch Control Plane Services (CPS), which in turn programs the silicon using the OCP Switch Abstraction Interface (SAI).

The rack in the booth contains a two spine, five leaf network. Each of the switches in the network Continue reading

Prometheus and Grafana

Prometheus is an open source time series database optimized to collect large numbers of metrics from cloud infrastructure. This article will explore how industry standard sFlow telemetry streaming supported by network devices and Host sFlow agents (Linux, Windows, FreeBSD, AIX, Solaris, Docker, Systemd, Hyper-V, KVM, Nutanix AHV, Xen) can be integrated with Prometheus.

The diagram above shows the elements of the solution: sFlow telemetry streams from hosts and switches to an instance of sFlow-RT. The sFlow-RT analytics software converts the raw measurements into metrics that are accessible through a REST API.

The following prometheus.php script mediates between the Prometheus metrics export protocol and the sFlow-RT REST API.  HTTP queries from Prometheus are translated into calls to the sFlow-RT REST API and JSON responses are converted into Prometheus metrics.
<?php
header('Content-Type: text/plain');
if(isset($_GET['labels'])) {
$keys = htmlspecialchars($_GET["labels"]);
}
$vals = htmlspecialchars($_GET["values"]);
if(isset($keys)) {
$cols = $keys.','.$vals;
} else {
$cols = $vals;
}
$key_arr = explode(",",$keys);
$result = file_get_contents('http://localhost:8008/table/ALL/'.$cols.'/json');
$obj = json_decode($result,true);
foreach ($obj as $row) {
unset($labels);
foreach ($row as $cell) {
if(!isset($labels)) {
$labels = 'agent="'.$cell['agent'].'",datasource="'.$cell['dataSource'].'"';
}
$name = $cell['metricName'];
$val = $cell['metricValue'];
if(in_array($name,$key_arr)) {
$labels .= Continue reading

OpenSwitch

OpenSwitch is a Linux Foundation project providing an open source white box control plane running on a standard Linux distribution. The diagram above shows the OpenSwitch architecture.

This article describes how to enable industry standard sFlow telemetry using the open source Host sFlow agent. The Host sFlow agent uses Control Plane Services (CPS) to configure sFlow instrumentation in the hardware and gather metrics. CPS in turn uses the Open Compute Project (OCP) Switch Abstraction Interface (SAI) as a vendor independent method of configuring the hardware. Hardware support for sFlow is a standard feature supported by Network Processing Unit (NPU) vendors (Barefoot, Broadcom, Cavium, Innovium, Intel, Marvell, Mellanox, etc.) and vendor neutral sFlow configuration is part of the SAI.

Installing and configuring Host sFlow agent

Installing the software is simple. Log into the switch and type the following commands:
wget --no-check-certificate https://github.com/sflow/host-sflow/releases/download/v2.0.17-1/hsflowd-opx_2.0.17-1_amd64.deb
sudo dpkg -i hsflowd-opx_2.0.17-1_amd64.deb
The sFlow agent requires very little configuration, automatically monitoring all switch ports using the following default settings:

Link SpeedSampling RatePolling Continue reading

Intranet DDoS attacks

As on a Darkling Plain: Network Survival in an Age of Pervasive DDoS talk by Steinthor Bjarnason at the recent NANOG 71 conference. The talk discusses the threat that the proliferation of network connected devices in enterprises create when they are used to launch denial of service attacks. Last year's Mirai attacks are described, demonstrating the threat posed by mixed mode attacks where a compromised host is used to infect large numbers devices on the corporate network.
The first slide from the talk shows a denial attack launched against an external target, launched from infected video surveillance cameras scattered throughout the the enterprise network. The large volume of traffic fills up external WAN link and overwhelms stateful firewalls.
The second slide shows an attack targeting critical internal services that can have been identified by reconnaissance from the compromised devices. In addition, scanning activity associated with reconnaissance for additional devices can itself overload internal resources and cause outages.

In both cases, most of the critical activity occurs behind the corporate firewall, making it extremely challenging to detect and mitigate these threats.

The talk discusses a number of techniques that service providers use to secure their networks that enterprises will need to adopt Continue reading

RESTful control of Cumulus Linux ACLs

The diagram above shows how the Cumulus Linux 3.4 HTTP API can be extended to include the functionality described in REST API for Cumulus Linux ACLs. Fast programmatic control of Cumulus Linux ACLs addresses a number of interesting use cases, including: DDoS mitigationElephant flow marking, and Triggered remote packet capture using filtered ERSPAN.

The Github pphaal/acl_server project INSTALL page describes how to install the acl_server daemon and configure the NGINX web server front end for the Cumulus Linux REST API to include the acl_server functions. The integration ensures that the same access controls configured for the REST API apply to the acl_server functions, which appear under the /acl/ path.

The following examples demonstrate the REST API.

Create an ACL

curl -X PUT -H 'Content-Type:application/json' --data '["[iptables]","-A FORWARD --in-interface swp+ -d 10.10.100.10 -p udp --sport 53 -j DROP"]' -k -u 'cumulus:CumulusLinux!' https://10.0.0.52:8080/acl/ddos1
ACLs are sent as a JSON encoded array of strings. Each string will be written as a line in a file stored under /etc/cumulus/acl/policy.d/ - See Cumulus Linux: Netfilter - ACLs. For example, the rule above will be written to the file 50rest-ddos1.rules with the following Continue reading

Real-time WiFi heat map

Real-time Wifi-Traffic Heatmap (source code GitHub: cod3monk/showfloor-heatmap) displays real-time WiFi traffic from SC17 (The International Conference for High Performance Computing, Networking, Storage and Analysis, November 12-17, 2017). Click on the link to see live data.

The Cisco Wireless access points in the conference network don't currently support sFlow, however, the access points are connected to Juniper EX switches which stream sFlow telemetry to an instance of sFlow-RT analytics software that provides real-time usage metrics for the heat map.

Wireless describes the additional visibility delivered by sFlow capable wireless access points, including: air time, channel, retransmissions, receive / transmit speeds, power, signal to noise ratio, etc. With sFlow enabled wireless access points, additional information could be layered on the heat map. The sFlow.org web site lists network products and vendors that support the sFlow standard.
1 5 6 7 8 9 14