Archive

Category Archives for "sFlow"

Hybrid OpenFlow ECMP testbed


SDN fabric controller for commodity data center switches describes how the real-time visibility and hybrid control capabilities of commodity data center switches can be used to automatically adapt the network to changing traffic patterns and optimize performance. The article identifies hybrid OpenFlow as a critical component of the solution, allowing SDN to be combined with proven distributed routing protocols (e.g. BGP, ISIS, OSPF, etc) to deliver scaleable, production ready solutions that fully leverage the capabilities of commodity hardware.

This article will take the example of large flow marking that has been demonstrated using physical switches and show how Mininet can be used to emulate hybrid control of data center networks and deliver realistic results.
The article Elephant Detection in Virtual Switches & Mitigation in Hardware describes a demonstration by VMware and Cumulus Networks that shows how real-time detection and marking of large "Elephant" flows can dramatically improve application response time for small latency sensitive "Mouse" flows without impacting the throughput of the Elephants - see Marking large flows for additional background.
Performance optimizing hybrid OpenFlow controller demonstrated how hybrid OpenFlow can be used to mark Elephant flows on a top of rack switch. However, building test networks with physical Continue reading

REST API for Cumulus Linux ACLs

RESTful control of Cumulus Linux ACLs included a proof of concept script that demonstrated how to remotely control iptables entries in Cumulus Linux.  Cumulus Linux in turn converts the standard Linux iptables rules into the hardware ACLs implemented by merchant silicon switch ASICs to deliver line rate filtering.

Previous blog posts demonstrated how remote control of Cumulus Linux ACLs can be used for DDoS mitigation and Large "Elephant" flow marking.

A more advanced version of the script is now available on GitHub:

https://github.com/pphaal/acl_server/

The new script adds the following features:
  1. It now runs as a daemon.
  2. Exceptions generated by cl-acltool are caught and handled
  3. Rules are compiled asynchronously, reducing response time of REST calls
  4. Updates are batched, supporting hundreds of operations per second
The script doesn't provide any security, which may be acceptable if access to the REST API is limited to the management port, but is generally unacceptable for production deployments.

Fortunately, Cumulus Linux is a open Linux distribution that allows additional software components to be installed. Rather than being forced to add authentication and encryption to the script, it is possible to install additional software and leverage the capabilities of a mature web server such as Apache. Continue reading

Fabric visibility with Cumulus Linux

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance.

Fabric View is free to try, just register at http://www.myinmon.com/ and request an evaluation. The software requires an accurate network topology in order to characterize performance and this article will describe how to obtain the topology from a Cumulus Networks fabric.

Complex Topology and Wiring Validation in Data Centers describes how Cumulus Networks' prescriptive topology manager (PTM) provides Continue reading

DDoS flood protection


Denial of Service attacks represents a significant impact to on-going operations of many businesses. When most revenue is derived from on-line operation, a DDoS attack can put a company out of business. There are many flavors of DDoS attacks, but the objective is always the same: to saturate a resource, such as a router, switch, firewall or web server, with multiple simultaneous and bogus requests, from many different sources. These attacks generate large volumes of traffic, 100Gbit/s attacks are now common, making mitigation a challenge.

The 3 minute video demonstrates Flood Protect - a DDoS mitigation solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time detection and mitigation of DDoS attacks. Flood Protect is an application running on InMon's Switch Fabric Accelerator SDN controller. Other applications provide visibility and accelerate fabric performance applying controls reduce latency and increase throughput.
An early version of Flood Protect won the 2014 SDN Idol competition in a joint demonstration with Brocade Networks.
Visit sFlow.com to learn more, evaluate pre-release versions of these products, or discuss requirements.

Stop thief!

The Host-sFlow project recently added added CPU steal to the set of CPU metrics exported.
steal (since Linux 2.6.11)
(8) Stolen time, which is the time spent in other operating systems
when running in a virtualized environment
Keeping close track of the stolen time metric is particularly import when running managing virtual machines in a public cloud. For example, Netflix and Stolen Time includes the discussion:
So how does Netflix handle this problem when using Amazon’s Cloud? Adrian admits that they tracked this statistic so closely that when an instance crossed a stolen time threshold the standard operating procedure at Netflix was to kill the VM and start it up on a different hypervisor. What Netflix realized over time was that once a VM was performing poorly because another VM was crashing the party, usually due to a poorly written or compute intensive application hogging the machine, it never really got any better and their best learned approach was to get off that machine.
The following articles describe how to monitor public cloud instances using Host sFlow agents:
The CPU steal metric is particularly relevant to Network Function Virtualization (NFV). Virtual Continue reading

InfluxDB and Grafana

Cluster performance metrics describes how to use sFlow-RT to calculate metrics and post them to Graphite. This article will describe how to use sFlow with the InfluxDB time series database and Grafana dashboard builder.

The diagram shows the measurement pipeline. Standard sFlow measurements from hosts, hypervisors, virtual machines, containers, load balancers, web servers and network switches stream to the sFlow-RT real-time analytics engine. Over 40 vendors implement the sFlow standard and compatible products are listed on sFlow.org. The open source Host sFlow agent exports standard sFlow metrics from hosts. For additional background, the Velocity conference talk provides an introduction to sFlow and case study from a large social networking site.
It is possible to simply convert the raw sFlow metrics into InfluxDB metrics. The sflow2graphite.pl script provides an example that can be modified to support InfluxDB's native format, or used unmodified with the InfluxDB Graphite input plugin. However, there are scaleability advantages to placing the sFlow-RT analytics engine in front of the time series database. For example, in large scale cloud environments the metrics for each member of a dynamic pool isn't necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT tracks all the Continue reading

Monitoring leaf and spine fabric performance


A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance. Fabric View is an application running on InMon's Switch Fabric Accelerator SDN controller. Other applications can automatically respond to problems and apply controls to protect against DDoS attacks, reduce latency and increase throughput.

Visit sFlow.com to learn more, evaluate pre-release versions of these products, or discuss requirements.

Open vSwitch 2014 Fall Conference


Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2014 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: large scale operation experiences at Rackspace, implementing stateful firewalls, Docker networking,  and acceleration technologies (Intel DPDK and Netmap/VALE).

The video above is a recording of the following sFlow related talk from the conference:
Traffic visibility and control with sFlow (Peter Phaal, InMon)
sFlow instrumentation has been included in Open vSwitch since version 0.99.1 (released 25 Jan 2010). This talk will introduce the sFlow architecture and discuss how it differs from NetFlow/IPFIX, particularly in regards to delivering real-time flow analytics to an SDN controller. The talk will demonstrate that sFlow measurements from Open vSwitch are identical to sFlow measurements made in hardware on bare metal switches, providing unified, end-to-end, measurement across physical and virtual networks. Finally, Open vSwitch / Mininet will be used to demonstrate Continue reading

SDN fabric controllers

Credit: sFlow.com
There is an ongoing debate in the software defined networking community about the functional split between a software edge and the physical core. Brad Hedlund argues the case in On choosing VMware NSX or Cisco ACI that a software only solution maximizes flexibility and creates fluid resource pools. Brad argues for a network overlay architecture that is entirely software based and completely independent of the underlying physical network. On the other hand, Ivan Pepelnjak argues in Overlay-to-underlay network interactions: document your hidden assumptions that the physical core cannot be ignored and, when you get past the marketing hype, even the proponents of network virtualization acknowledge the importance of the physical network in delivering edge services.

Despite differences, the advantages of a software based network edge are compelling and there is emerging consensus behind this architecture with  a large number of solutions available, including: Hadoop, Mesos, OpenStack, VMware NSX, Juniper OpenContrail, Midokura Midonet, Nuage Networks Virtual Services Platform, CPLANE Dynamic Virtual Networks and PLUMgrid Open Networking Suite.

In addition, the move to a software based network edge is leading to the adoption of configuration management and deployment tools from the DevOps Continue reading

Super NORMAL

KennyK/Shutterstock
HP proposes hybrid OpenFlow discussion at Open Daylight design forum describes some of the benefits of integrated hybrid OpenFlow and the reasons why the OpenDaylight community would be a good venue for addressing operational and multi-vendor interoperability issues relating to hybrid OpenFlow.

HP's slide presentation from the design forum, OpenFlow-hybrid Mode, gives an overview of hybrid mode OpenFlow and its benefits. The advantage of hybrid mode in leveraging the proven scaleability and operational robustness of existing distributed control mechanisms and complementing them with centralized SDN control is compelling and a number of vendors have released support, including: Alcatel Lucent Enterprise, Brocade, Extreme, Hewlett-Packard, Mellanox, and Pica8. HP's presentation goes on to propose enhancements to the OpenDaylight controller to support hybrid OpenFlow agents.

InMon recently built a hybrid OpenFlow controller and, based on our experiences, this article will discuss how integrated hybrid mode is currently implemented on the switches, examine operational issues, and propose an agent profile for hybrid OpenFlow designed to reduce operational complexity, particularly when addressing traffic engineering use cases such as DDoS mitigation, large flow marking and large flow steering on ECMP/LAG networks.

Mechanisms for Optimizing LAG/ECMP Component Link Utilization in Networks is an IETF Continue reading

SDN control of hybrid packet / optical leaf and spine network

9/19 DemoFriday: CALIENT, Cumulus Networks and InMon Demo SDN Optimization of Hybrid Packet / Optical Data Center Fabric demonstrated how network analytics can be used to optimize traffic flows across a network composed of bare metal packet switches running Cumulus Linux and Calient Optical Circuit switches.


The short video above shows how the Calient optical circuit switch (OCS) uses two grids of micro-mirrors to create optical paths. The optical switching technology has a number of interesting properties:
  • Pure optical cut-through, the speed of the link is limited only by the top of rack transceiver speeds (i.e. scales to 100G, 400G and beyond without having to upgrade the OCS)
  • Ultra low latency - less than 50ns
  • Lower cost than an equivalent packet switch
  • Ultra low power (50W vs. 6KW for comparable packet switch)
The challenge is integrating the OCS into a hybrid data center network design to leverage the strengths of both packet switching and optical switching technologies.

The diagram shows the hybrid network that was demonstrated. The top of rack switches are bare metal switches running Cumulus Linux. The spine layer consists of a Cumulus Linux bare metal switch and a Calient Technologies optical circuit switch. The bare metal Continue reading

HP proposes hybrid OpenFlow discussion at Open Daylight design forum

Hewlett-Packard, an Open Daylight platinum member, is proposing a discussion of integrated hybrid OpenFlow at the upcoming Open Daylight Developer Design Forum, September 29 - 30, 2014, Santa Clara.

Topics for ODL Design Summit from HP contains the following proposal, making the case for integrated hybrid OpenFlow:
We would like to share our experiences with Customer SDN deployments that require OpenFlow hybrid mode. Why it matters, implementation considerations, and how to achieve better support for it in ODL

OpenFlow-compliant switches come in two types: OpenFlow-only, and OpenFlow-hybrid. OpenFlow-only switches support only OpenFlow operation, in those switches all packets are processed by the OpenFlow pipeline, and cannot be processed otherwise. OpenFlow-hybrid switches support both OpenFlow operation and normal Ethernet switching operation, i.e. traditional L2 Ethernet switching, VLAN isolation, L3 routing (IPv4 routing, IPv6 routing...), ACL and QoS processing

The rationale for supporting hybrid mode is twofold:
  1. Controlled switches have decades of embedded traditional networking logic. The controller does not add value to a solution if it replicates traditional forwarding logic. One alternative controller responsibility is that provides forwarding decisions when it wants to override the traditional data-plane forwarding decision.
  2. Controllers can be gradually incorporated into a traditional network. Continue reading

New Feature in Cumulus Linux 2.2: sFlow

sFlow is an open protocol, newly supported in Cumulus Linux 2.2, that enables a collector to determine what is going on in a complex network.

It is used to collect statistics, such as packet counts, error counts, CPU usage, etc from a large number of individual switches. What is especially interesting is that it can be used to collect sampled packets (usually only the first n bytes, containing the header), along with some metadata about those packets.

Bringing sFlow to Cumulus Linux was particuarly easy, because “hsflowd” was already available for implementing sFlow support on Linux servers. We were able to reuse that existing code, with extremely minimal modification, to implement sFlow on our Linux based switches.

sFlow allows a collector to get a statistical view of what is going on in a collection of switches, approaching per-flow granularity. This is extremely useful information to present to users for capacity planning and debugging purposes, but things really get interesting when the collector can make decisions based on the information.

For example, our friends at inMon implemented detection of elephant flows (high bandwidth), followed by marking those flows on the switch at network ingress for special QoS handling. This nearly Continue reading

DDoS mitigation with Cumulus Linux

Figure 1: Real-time SDN Analytics for DDoS mitigation
Figure 1 shows how service providers are ideally positioned to mitigate large flood attacks directed at their customers. The mitigation solution involves an SDN controller that rapidly detects and filters out attack traffic and protects the customer's Internet access.

This article builds on the test setup described in RESTful control of Cumulus Linux ACLs in order to implement the ONS 2014 SDN Idol winning distributed denial of service (DDoS) mitigation solution - Real-time SDN Analytics for DDoS mitigation.

The following sFlow-RT application implements basic DDoS mitigation functionality:
include('extras/json2.js');

// Define large flow as greater than 100Mbits/sec for 1 second or longer
var bytes_per_second = 100000000/8;
var duration_seconds = 1;

var id = 0;
var controls = {};

setFlow('udp_target',
{keys:'ipdestination,udpsourceport', value:'bytes',
filter:'direction=egress', t:duration_seconds}
);

setThreshold('attack',
{metric:'udp_target', value:bytes_per_second, byFlow:true, timeout:4,
filter:{ifspeed:[1000000000]}}
);

setEventHandler(function(evt) {
if(controls[evt.flowKey]) return;

var rulename = 'ddos' + id++;
var keys = evt.flowKey.split(',');
var acl = [
'[iptables]',
'# block UDP reflection attack',
'-A FORWARD --in-interface swp+ -d ' + keys[0]
+ ' -p udp --sport ' + keys[1] + ' -j DROP'
];
http('http://'+evt.agent+':8080/acl/'+rulename,
'put','application/json',JSON.stringify(acl));
controls[evt.flowKey] = {
agent:evt.agent,
dataSource:evt.dataSource,
rulename:rulename,
Continue reading

Docker performance monitoring

IT’S HERE: DOCKER 1.0 recently announced the first production release of the Docker Linux container platform. Docker is seeing explosive growth and has already been embraced by IBM, RedHat and RackSpace. Today the open source Host sFlow project released support for Docker, exporting standard sFlow performance metrics for Linux containers and unifying Linux containers with the broader sFlow ecosystem.
Visibility and the software defined data center
Host sFlow Docker support simplifies data center performance management by unifying monitoring of Linux containers with monitoring of virtual machines (Hyper-V, KVM/libvirt, Xen/XCP/XenServer), virtual switches (Open vSwitch, Hyper-V Virtual Switch, IBM Distributed Virtual Switch, HP FlexFabric Virtual Switch), servers (Linux, Windows, Solaris, AIX, FreeBSD), and physical networks (over 40 vendors, including: A10, Arista, Alcatel-Lucent, Arista, Brocade, Cisco, Cumulus, Extreme, F5, Hewlett-Packard, Hitachi, Huawei, IBM, Juniper, Mellanox, NEC, ZTE). In addition, standardizing metrics allows allows measurements to be shared among different tools, further reducing operational complexity.


The talk provides additional background on the sFlow standard and case studies. The remainder of this article describes how to use Host sFlow to monitor a Docker server pool.

First, download, compile and install the Host sFlow agent on a Docker host (Note: The agent needs to Continue reading

Microsoft Office 365 outage

6/24/2014 Information Week - Microsoft Exchange Online Suffers Service Outage, "Service disruptions with Microsoft's Exchange Online left many companies with no email on Tuesday."

The following entry on the Microsoft 365 community forum describes the incident:
====================================

Closure Summary: On Tuesday, June 24, 2014, at approximately 1:11 PM UTC, engineers received reports of an issue in which some customers were unable to access the Exchange Online service. Investigation determined that a portion of the networking infrastructure entered into a degraded state. Engineers made configuration changes on the affected capacity to remediate end-user impact. The issue was successfully fixed on Tuesday, June 24, 2014, at 9:50 PM UTC.

Customer Impact: Affected customers were unable to access the Exchange Online service.

Incident Start Time: Tuesday, June 24, 2014, at 1:11 PM UTC

Incident End Time: Tuesday, June 24, 2014, at 9:50 PM UTC

=====================================
The closure summary shows that operators took 8 hour 39 minutes to manually diagnose and remediate the problem with degraded networking infrastructure. The network related outage described in this example is not an isolated incident; other incidents described on this blog include: Packet lossAmazon EC2 outageGmail outageDelay vs utilization for Continue reading

RESTful control of Cumulus Linux ACLs

Figure 1: Elephants and Mice
Elephant Detection in Virtual Switches & Mitigation in Hardware discusses a VMware and Cumulus demonstration, Elephants and Mice, in which the virtual switch on a host detects and marks large "Elephant" flows and the hardware switch enforces priority queueing to prevent Elephant flows from adversely affecting latency of small "Mice" flows.

This article demonstrates a self contained real-time Elephant flow marking solution that leverages the visibility and control features of Cumulus Linux.

SDN fabric controller for commodity data center switches provides some background on the capabilities of the commodity switch hardware used to run Cumulus Linux. The article describes how the measurement and control capabilities of the hardware can be used to maximize data center fabric performance:
Exposing the ACL configuration files through a RESTful API offers a straightforward method of remotely creating, reading, updating, deleting and listing ACLs.

For example, the following command creates a filter called Continue reading

Cumulus Networks, sFlow and data center automation

Cumulus Networks and InMon Corp have ported the open source Host sFlow agent to the upcoming Cumulus Linux 2.1 release. The Host sFlow agent already supports Linux, Windows, FreeBSD, Solaris, and AIX operating systems and KVM, Xen, XCP, XenServer, and Hyper-V hypervisors, delivering a standard set of performance metrics from switches, servers, hypervisors, virtual switches, and virtual machines - see Visibility and the software defined data center

The Cumulus Linux platform makes it possible to run the same open source agent on switches, servers, and hypervisors - providing unified end-to-end visibility across the data center. The open networking model that Cumulus is pioneering offers exciting opportunities. Cumulus Linux allows popular open source server orchestration tools to also manage the network, and the combination of real-time, data center wide analytics with orchestration make it possible to create self-optimizing data centers.

Install and configure Host sFlow agent

The following command installs the Host sFlow agent on a Cumulus Linux switch:
sudo apt-get install hsflowd
Note: Network managers may find this command odd since it is usually not possible to install third party software on switch hardware. However, what is even more radical is that Cumulus Linux allows users to download source Continue reading

SDN fabric controller for commodity data center switches

Figure 1: Rise of merchant silicon
Figure 1 illustrates the rapid transition to merchant silicon among leading data center network vendors, including: Alcatel-Lucent, Arista, Cisco, Cumulus, Dell, Extreme, Juniper, Hewlett-Packard, and IBM.

This article will examine some of the factors leading to commoditization of network hardware and the role that software defined networking (SDN) plays in coordinating hardware resources to deliver increased network efficiency.
Figure 2: Fabric: A Retrospective on Evolving SDN
The article, Fabric: A Retrospective on Evolving SDN by Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian, makes the case for a two tier SDN architecture; comprising a smart edge and an efficient core.
Table 1: Edge vs Fabric Functionality
Virtualization and advances in the networking capability of x86 based servers are drivers behind this separation. Virtual machines are connected to each other and to the physical network using a software virtual switch. The software switch provides the flexibility to quickly develop and deploy advanced features like network virtualization, tenant isolation, distributed firewalls, etc. Network function virtualization (NFV) is moving firewall, load balancing, routing, etc. functions from dedicated appliances to virtual machines or embedding them within the virtual switches. The increased importance of network centric software has Continue reading

Load balancing large flows on multi-path networks

Figure 1: Active control of large flows in a multi-path topology
Figure 1 shows initial results from the Mininet integrated hybrid OpenFlow testbed demonstrating that active steering of large flows using a performance aware SDN controller significantly improves network throughput of multi-path network topologies.
Figure 2: Two path topology
The graph in Figure 1 summarizes results from topologies with 2, 3 and 4 equal cost paths. For example, the Mininet topology in Figure 2 has two equal cost paths of 10Mbit/s (shown in blue and red). The iperf traffic generator was used to create a continuous stream of 20 second flows from h1 to h3 and from h2 to h4. If traffic were perfectly balanced, each flow would achieve 10Mbit/s throughput. However, Figure 1 shows that the throughput obtained using hash based ECMP load balancing is approximately 6.8Mbit/s. Interestingly, the average link throughput decreases as additional paths are added, dropping to approximately 6.2Mbit/s with four equal cost paths (see the blue bars in Figure 1).

To ensure that packets in a flow arrive in order at their destination, switch s3 computes a hash function over selected fields in the packets (e.g. source and destination IP addresses Continue reading