Peter

Author Archives: Peter

DDoS flood protection


Denial of Service attacks represents a significant impact to on-going operations of many businesses. When most revenue is derived from on-line operation, a DDoS attack can put a company out of business. There are many flavors of DDoS attacks, but the objective is always the same: to saturate a resource, such as a router, switch, firewall or web server, with multiple simultaneous and bogus requests, from many different sources. These attacks generate large volumes of traffic, 100Gbit/s attacks are now common, making mitigation a challenge.

The 3 minute video demonstrates Flood Protect - a DDoS mitigation solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time detection and mitigation of DDoS attacks. Flood Protect is an application running on InMon's Switch Fabric Accelerator SDN controller. Other applications provide visibility and accelerate fabric performance applying controls reduce latency and increase throughput.
An early version of Flood Protect won the 2014 SDN Idol competition in a joint demonstration with Brocade Networks.
Visit sFlow.com to learn more, evaluate pre-release versions of these products, or discuss requirements.

Stop thief!

The Host-sFlow project recently added added CPU steal to the set of CPU metrics exported.
steal (since Linux 2.6.11)
(8) Stolen time, which is the time spent in other operating systems
when running in a virtualized environment
Keeping close track of the stolen time metric is particularly import when running managing virtual machines in a public cloud. For example, Netflix and Stolen Time includes the discussion:
So how does Netflix handle this problem when using Amazon’s Cloud? Adrian admits that they tracked this statistic so closely that when an instance crossed a stolen time threshold the standard operating procedure at Netflix was to kill the VM and start it up on a different hypervisor. What Netflix realized over time was that once a VM was performing poorly because another VM was crashing the party, usually due to a poorly written or compute intensive application hogging the machine, it never really got any better and their best learned approach was to get off that machine.
The following articles describe how to monitor public cloud instances using Host sFlow agents:
The CPU steal metric is particularly relevant to Network Function Virtualization (NFV). Virtual Continue reading

InfluxDB and Grafana

Cluster performance metrics describes how to use sFlow-RT to calculate metrics and post them to Graphite. This article will describe how to use sFlow with the InfluxDB time series database and Grafana dashboard builder.

The diagram shows the measurement pipeline. Standard sFlow measurements from hosts, hypervisors, virtual machines, containers, load balancers, web servers and network switches stream to the sFlow-RT real-time analytics engine. Over 40 vendors implement the sFlow standard and compatible products are listed on sFlow.org. The open source Host sFlow agent exports standard sFlow metrics from hosts. For additional background, the Velocity conference talk provides an introduction to sFlow and case study from a large social networking site.
It is possible to simply convert the raw sFlow metrics into InfluxDB metrics. The sflow2graphite.pl script provides an example that can be modified to support InfluxDB's native format, or used unmodified with the InfluxDB Graphite input plugin. However, there are scaleability advantages to placing the sFlow-RT analytics engine in front of the time series database. For example, in large scale cloud environments the metrics for each member of a dynamic pool isn't necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT tracks all the Continue reading

Monitoring leaf and spine fabric performance


A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance. Fabric View is an application running on InMon's Switch Fabric Accelerator SDN controller. Other applications can automatically respond to problems and apply controls to protect against DDoS attacks, reduce latency and increase throughput.

Visit sFlow.com to learn more, evaluate pre-release versions of these products, or discuss requirements.

Open vSwitch 2014 Fall Conference


Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2014 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: large scale operation experiences at Rackspace, implementing stateful firewalls, Docker networking,  and acceleration technologies (Intel DPDK and Netmap/VALE).

The video above is a recording of the following sFlow related talk from the conference:
Traffic visibility and control with sFlow (Peter Phaal, InMon)
sFlow instrumentation has been included in Open vSwitch since version 0.99.1 (released 25 Jan 2010). This talk will introduce the sFlow architecture and discuss how it differs from NetFlow/IPFIX, particularly in regards to delivering real-time flow analytics to an SDN controller. The talk will demonstrate that sFlow measurements from Open vSwitch are identical to sFlow measurements made in hardware on bare metal switches, providing unified, end-to-end, measurement across physical and virtual networks. Finally, Open vSwitch / Mininet will be used to demonstrate Continue reading

SDN fabric controllers

Credit: sFlow.com
There is an ongoing debate in the software defined networking community about the functional split between a software edge and the physical core. Brad Hedlund argues the case in On choosing VMware NSX or Cisco ACI that a software only solution maximizes flexibility and creates fluid resource pools. Brad argues for a network overlay architecture that is entirely software based and completely independent of the underlying physical network. On the other hand, Ivan Pepelnjak argues in Overlay-to-underlay network interactions: document your hidden assumptions that the physical core cannot be ignored and, when you get past the marketing hype, even the proponents of network virtualization acknowledge the importance of the physical network in delivering edge services.

Despite differences, the advantages of a software based network edge are compelling and there is emerging consensus behind this architecture with  a large number of solutions available, including: Hadoop, Mesos, OpenStack, VMware NSX, Juniper OpenContrail, Midokura Midonet, Nuage Networks Virtual Services Platform, CPLANE Dynamic Virtual Networks and PLUMgrid Open Networking Suite.

In addition, the move to a software based network edge is leading to the adoption of configuration management and deployment tools from the DevOps Continue reading

Super NORMAL

KennyK/Shutterstock
HP proposes hybrid OpenFlow discussion at Open Daylight design forum describes some of the benefits of integrated hybrid OpenFlow and the reasons why the OpenDaylight community would be a good venue for addressing operational and multi-vendor interoperability issues relating to hybrid OpenFlow.

HP's slide presentation from the design forum, OpenFlow-hybrid Mode, gives an overview of hybrid mode OpenFlow and its benefits. The advantage of hybrid mode in leveraging the proven scaleability and operational robustness of existing distributed control mechanisms and complementing them with centralized SDN control is compelling and a number of vendors have released support, including: Alcatel Lucent Enterprise, Brocade, Extreme, Hewlett-Packard, Mellanox, and Pica8. HP's presentation goes on to propose enhancements to the OpenDaylight controller to support hybrid OpenFlow agents.

InMon recently built a hybrid OpenFlow controller and, based on our experiences, this article will discuss how integrated hybrid mode is currently implemented on the switches, examine operational issues, and propose an agent profile for hybrid OpenFlow designed to reduce operational complexity, particularly when addressing traffic engineering use cases such as DDoS mitigation, large flow marking and large flow steering on ECMP/LAG networks.

Mechanisms for Optimizing LAG/ECMP Component Link Utilization in Networks is an IETF Continue reading

SDN control of hybrid packet / optical leaf and spine network

9/19 DemoFriday: CALIENT, Cumulus Networks and InMon Demo SDN Optimization of Hybrid Packet / Optical Data Center Fabric demonstrated how network analytics can be used to optimize traffic flows across a network composed of bare metal packet switches running Cumulus Linux and Calient Optical Circuit switches.


The short video above shows how the Calient optical circuit switch (OCS) uses two grids of micro-mirrors to create optical paths. The optical switching technology has a number of interesting properties:
  • Pure optical cut-through, the speed of the link is limited only by the top of rack transceiver speeds (i.e. scales to 100G, 400G and beyond without having to upgrade the OCS)
  • Ultra low latency - less than 50ns
  • Lower cost than an equivalent packet switch
  • Ultra low power (50W vs. 6KW for comparable packet switch)
The challenge is integrating the OCS into a hybrid data center network design to leverage the strengths of both packet switching and optical switching technologies.

The diagram shows the hybrid network that was demonstrated. The top of rack switches are bare metal switches running Cumulus Linux. The spine layer consists of a Cumulus Linux bare metal switch and a Calient Technologies optical circuit switch. The bare metal Continue reading

HP proposes hybrid OpenFlow discussion at Open Daylight design forum

Hewlett-Packard, an Open Daylight platinum member, is proposing a discussion of integrated hybrid OpenFlow at the upcoming Open Daylight Developer Design Forum, September 29 - 30, 2014, Santa Clara.

Topics for ODL Design Summit from HP contains the following proposal, making the case for integrated hybrid OpenFlow:
We would like to share our experiences with Customer SDN deployments that require OpenFlow hybrid mode. Why it matters, implementation considerations, and how to achieve better support for it in ODL

OpenFlow-compliant switches come in two types: OpenFlow-only, and OpenFlow-hybrid. OpenFlow-only switches support only OpenFlow operation, in those switches all packets are processed by the OpenFlow pipeline, and cannot be processed otherwise. OpenFlow-hybrid switches support both OpenFlow operation and normal Ethernet switching operation, i.e. traditional L2 Ethernet switching, VLAN isolation, L3 routing (IPv4 routing, IPv6 routing...), ACL and QoS processing

The rationale for supporting hybrid mode is twofold:
  1. Controlled switches have decades of embedded traditional networking logic. The controller does not add value to a solution if it replicates traditional forwarding logic. One alternative controller responsibility is that provides forwarding decisions when it wants to override the traditional data-plane forwarding decision.
  2. Controllers can be gradually incorporated into a traditional network. Continue reading

DDoS mitigation with Cumulus Linux

Figure 1: Real-time SDN Analytics for DDoS mitigation
Figure 1 shows how service providers are ideally positioned to mitigate large flood attacks directed at their customers. The mitigation solution involves an SDN controller that rapidly detects and filters out attack traffic and protects the customer's Internet access.

This article builds on the test setup described in RESTful control of Cumulus Linux ACLs in order to implement the ONS 2014 SDN Idol winning distributed denial of service (DDoS) mitigation solution - Real-time SDN Analytics for DDoS mitigation.

The following sFlow-RT application implements basic DDoS mitigation functionality:
include('extras/json2.js');

// Define large flow as greater than 100Mbits/sec for 1 second or longer
var bytes_per_second = 100000000/8;
var duration_seconds = 1;

var id = 0;
var controls = {};

setFlow('udp_target',
{keys:'ipdestination,udpsourceport', value:'bytes',
filter:'direction=egress', t:duration_seconds}
);

setThreshold('attack',
{metric:'udp_target', value:bytes_per_second, byFlow:true, timeout:4,
filter:{ifspeed:[1000000000]}}
);

setEventHandler(function(evt) {
if(controls[evt.flowKey]) return;

var rulename = 'ddos' + id++;
var keys = evt.flowKey.split(',');
var acl = [
'[iptables]',
'# block UDP reflection attack',
'-A FORWARD --in-interface swp+ -d ' + keys[0]
+ ' -p udp --sport ' + keys[1] + ' -j DROP'
];
http('http://'+evt.agent+':8080/acl/'+rulename,
'put','application/json',JSON.stringify(acl));
controls[evt.flowKey] = {
agent:evt.agent,
dataSource:evt.dataSource,
rulename:rulename,
Continue reading

Docker performance monitoring

IT’S HERE: DOCKER 1.0 recently announced the first production release of the Docker Linux container platform. Docker is seeing explosive growth and has already been embraced by IBM, RedHat and RackSpace. Today the open source Host sFlow project released support for Docker, exporting standard sFlow performance metrics for Linux containers and unifying Linux containers with the broader sFlow ecosystem.
Visibility and the software defined data center
Host sFlow Docker support simplifies data center performance management by unifying monitoring of Linux containers with monitoring of virtual machines (Hyper-V, KVM/libvirt, Xen/XCP/XenServer), virtual switches (Open vSwitch, Hyper-V Virtual Switch, IBM Distributed Virtual Switch, HP FlexFabric Virtual Switch), servers (Linux, Windows, Solaris, AIX, FreeBSD), and physical networks (over 40 vendors, including: A10, Arista, Alcatel-Lucent, Arista, Brocade, Cisco, Cumulus, Extreme, F5, Hewlett-Packard, Hitachi, Huawei, IBM, Juniper, Mellanox, NEC, ZTE). In addition, standardizing metrics allows allows measurements to be shared among different tools, further reducing operational complexity.


The talk provides additional background on the sFlow standard and case studies. The remainder of this article describes how to use Host sFlow to monitor a Docker server pool.

First, download, compile and install the Host sFlow agent on a Docker host (Note: The agent needs to Continue reading

Microsoft Office 365 outage

6/24/2014 Information Week - Microsoft Exchange Online Suffers Service Outage, "Service disruptions with Microsoft's Exchange Online left many companies with no email on Tuesday."

The following entry on the Microsoft 365 community forum describes the incident:
====================================

Closure Summary: On Tuesday, June 24, 2014, at approximately 1:11 PM UTC, engineers received reports of an issue in which some customers were unable to access the Exchange Online service. Investigation determined that a portion of the networking infrastructure entered into a degraded state. Engineers made configuration changes on the affected capacity to remediate end-user impact. The issue was successfully fixed on Tuesday, June 24, 2014, at 9:50 PM UTC.

Customer Impact: Affected customers were unable to access the Exchange Online service.

Incident Start Time: Tuesday, June 24, 2014, at 1:11 PM UTC

Incident End Time: Tuesday, June 24, 2014, at 9:50 PM UTC

=====================================
The closure summary shows that operators took 8 hour 39 minutes to manually diagnose and remediate the problem with degraded networking infrastructure. The network related outage described in this example is not an isolated incident; other incidents described on this blog include: Packet lossAmazon EC2 outageGmail outageDelay vs utilization for Continue reading

RESTful control of Cumulus Linux ACLs

Figure 1: Elephants and Mice
Elephant Detection in Virtual Switches & Mitigation in Hardware discusses a VMware and Cumulus demonstration, Elephants and Mice, in which the virtual switch on a host detects and marks large "Elephant" flows and the hardware switch enforces priority queueing to prevent Elephant flows from adversely affecting latency of small "Mice" flows.

This article demonstrates a self contained real-time Elephant flow marking solution that leverages the visibility and control features of Cumulus Linux.

SDN fabric controller for commodity data center switches provides some background on the capabilities of the commodity switch hardware used to run Cumulus Linux. The article describes how the measurement and control capabilities of the hardware can be used to maximize data center fabric performance:
Exposing the ACL configuration files through a RESTful API offers a straightforward method of remotely creating, reading, updating, deleting and listing ACLs.

For example, the following command creates a filter called Continue reading

Cumulus Networks, sFlow and data center automation

Cumulus Networks and InMon Corp have ported the open source Host sFlow agent to the upcoming Cumulus Linux 2.1 release. The Host sFlow agent already supports Linux, Windows, FreeBSD, Solaris, and AIX operating systems and KVM, Xen, XCP, XenServer, and Hyper-V hypervisors, delivering a standard set of performance metrics from switches, servers, hypervisors, virtual switches, and virtual machines - see Visibility and the software defined data center

The Cumulus Linux platform makes it possible to run the same open source agent on switches, servers, and hypervisors - providing unified end-to-end visibility across the data center. The open networking model that Cumulus is pioneering offers exciting opportunities. Cumulus Linux allows popular open source server orchestration tools to also manage the network, and the combination of real-time, data center wide analytics with orchestration make it possible to create self-optimizing data centers.

Install and configure Host sFlow agent

The following command installs the Host sFlow agent on a Cumulus Linux switch:
sudo apt-get install hsflowd
Note: Network managers may find this command odd since it is usually not possible to install third party software on switch hardware. However, what is even more radical is that Cumulus Linux allows users to download source Continue reading

SDN fabric controller for commodity data center switches

Figure 1: Rise of merchant silicon
Figure 1 illustrates the rapid transition to merchant silicon among leading data center network vendors, including: Alcatel-Lucent, Arista, Cisco, Cumulus, Dell, Extreme, Juniper, Hewlett-Packard, and IBM.

This article will examine some of the factors leading to commoditization of network hardware and the role that software defined networking (SDN) plays in coordinating hardware resources to deliver increased network efficiency.
Figure 2: Fabric: A Retrospective on Evolving SDN
The article, Fabric: A Retrospective on Evolving SDN by Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian, makes the case for a two tier SDN architecture; comprising a smart edge and an efficient core.
Table 1: Edge vs Fabric Functionality
Virtualization and advances in the networking capability of x86 based servers are drivers behind this separation. Virtual machines are connected to each other and to the physical network using a software virtual switch. The software switch provides the flexibility to quickly develop and deploy advanced features like network virtualization, tenant isolation, distributed firewalls, etc. Network function virtualization (NFV) is moving firewall, load balancing, routing, etc. functions from dedicated appliances to virtual machines or embedding them within the virtual switches. The increased importance of network centric software has Continue reading

Load balancing large flows on multi-path networks

Figure 1: Active control of large flows in a multi-path topology
Figure 1 shows initial results from the Mininet integrated hybrid OpenFlow testbed demonstrating that active steering of large flows using a performance aware SDN controller significantly improves network throughput of multi-path network topologies.
Figure 2: Two path topology
The graph in Figure 1 summarizes results from topologies with 2, 3 and 4 equal cost paths. For example, the Mininet topology in Figure 2 has two equal cost paths of 10Mbit/s (shown in blue and red). The iperf traffic generator was used to create a continuous stream of 20 second flows from h1 to h3 and from h2 to h4. If traffic were perfectly balanced, each flow would achieve 10Mbit/s throughput. However, Figure 1 shows that the throughput obtained using hash based ECMP load balancing is approximately 6.8Mbit/s. Interestingly, the average link throughput decreases as additional paths are added, dropping to approximately 6.2Mbit/s with four equal cost paths (see the blue bars in Figure 1).

To ensure that packets in a flow arrive in order at their destination, switch s3 computes a hash function over selected fields in the packets (e.g. source and destination IP addresses Continue reading

Mininet integrated hybrid OpenFlow testbed

Figure 1: Hybrid Programmable Forwarding Planes
Integrated hybrid OpenFlow combines OpenFlow and existing distributed routing protocols to deliver robust software defined networking (SDN) solutions. Performance optimizing hybrid OpenFlow controller describes how the sFlow and OpenFlow standards combine to deliver visibility and control to address challenges including: DDoS mitigation, ECMP load balancing, LAG load balancing, and large flow marking.

A number of vendors support sFlow and integrated hybrid OpenFlow today, examples described on this blog include: Alcatel-Lucent, Brocade, and Hewlett-Packard. However, building a physical testbed is expensive and time consuming. This article describes how to build an sFlow and hybrid OpenFlow testbed using free Mininet network emulation software. The testbed emulates ECMP leaf and spine data center fabrics and provides a platform for experimenting with analytics driven feedback control using the sFlow-RT hybrid OpenFlow controller.

First build an Ubuntu 13.04 / 13.10 virtual machine then follow instructions for installing Mininet - Option 3: Installation from Packages.

Next, install an Apache web server:
sudo apt-get install apache2
Install the sFlow-RT integrated hybrid OpenFlow controller, either on the Mininet virtual machine, or on a different system (Java 1.6+ is required to run sFlow-RT):
 Continue reading

Configuring Mellanox switches

The following commands configure a Mellanox switch (10.0.0.252) to sample packets at 1-in-10000, poll counters every 30 seconds and send sFlow to an analyzer (10.0.0.50) using the default sFlow port 6343:
sflow enable
sflow agent-ip 10.0.0.252
sflow collector-ip 10.0.0.50
sflow sampling-rate 10000
sflow counter-poll-interval 30
For each interface:
interface ethernet 1/1 sflow enable
A previous posting discussed the selection of sampling rates. Additional information can be found on the Mellanox web site.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

DDoS mitigation hybrid OpenFlow controller

Performance optimizing hybrid OpenFlow controller describes the growing split in the SDN controller market between edge controllers using virtual switches to deliver network virtualization (e.g. VMware NSX, Nuage Networks, Juniper Contrail, etc.) and fabric controllers that optimize performance of the physical network. The article provides an example using InMon's sFlow-RT controller to detect and mark large "elephant" flows so that they don't interfere with latency sensitive small "mice" flows.

This article describes an additional example, using the sFlow-RT controller to implement the ONS 2014 SDN Idol winning distributed denial of service (DDoS) mitigation solution - Real-time SDN Analytics for DDoS mitigation.
Figure 1: ISP/IX Market Segment
Figure 1 shows how service providers are ideally positioned to mitigate large flood attacks directed at their customers. The mitigation solution involves an SDN controller that rapidly detects and filters out attack traffic and protects the customer's Internet access.
Figure 2: Novel DDoS Mitigation solution using Real-time SDN Analytics
Figure 2 shows the elements of the control system in the SDN Idol demonstration. The addition of an embedded OpenFlow controller in sFlow-RT allows the entire DDoS mitigation system to be collapsed into the following sFlow-RT JavaScript application:
// Define large flow  Continue reading

Cisco, ACI, OpFlex and OpenDaylight

Cisco's April 2nd, 2014 announcement - Cisco and Industry Leaders Will Deliver Open, Multi-Vendor, Standards-Based Networks for Application Centric Infrastructure with OpFlex Protocol - has drawn mixed reviews from industry commentators.

In, Cisco Submits Its (Very Different) SDN to IETF & OpenDaylight, SDNCentral editor Craig Matsumoto comments, "You know how, early on, people were all worried Cisco would 'take over' OpenDaylight? This is pretty much what they were talking about. It’s not a 'takeover,' literally, but OpFlex and the group policy concept steer OpenDaylight into a new direction that it otherwise wouldn’t have, one that Cisco happens to already have taken."

CIMI Corp. President, Tom Nolle, remarks "We’re all in business to make money, and if Cisco takes a position in a key market like SDN that seems to favor…well…doing nothing much different, you have to assume they have good reason to believe that their approach will resonate with buyers." - Cisco’s OpFlex: We Have Sound AND Fury

This article will look at some of the architectural issues raised by Cisco's announcement based on the following documents:
The diagram at the top of this article illustrates the Continue reading