Peter

Author Archives: Peter

White box Internet router PoC

SDN router using merchant silicon top of rack switch describes how the performance of a software Internet router could be accelerated using the hardware routing capabilities of a commodity switch. This article describes a proof of concept demonstration using Linux virtual machines and a bare metal switch running Cumulus Linux.
The diagram shows the demo setup, providing inter-domain routing between Peer 1 and Peer 2. The Peers are directly connected to the Hardware Switch and ingress packets are routed by the default (0.0.0.0/0) route to the Software Router. The Software Router learns the full set of routes from the Peers using BGP and forwards the packet to the correct next hop router. The packet is then switched to the selected peer router via bridge br_xen.

The following traceroute run on Peer 1 shows the set of router hops from 192.168.250.1 to 192.168.251.1
[root@peer1 ~]# traceroute -s 192.168.250.1 192.168.251.1
traceroute to 192.168.251.1 (192.168.251.1), 30 hops max, 40 byte packets
1 192.168.152.2 (192.168.152.2) 3.090 ms 3.014 ms 2.927 ms
2 192.168. Continue reading

SDN router using merchant silicon top of rack switch

The talk from David Barroso describes how Spotify optimizes hardware routing on a commodity switch by using sFlow analytics to identify the routes carrying the most traffic.  The full Internet routing table contains nearly 600,000 entries, too many for commodity switch hardware to handle. However, not all entries are active all the time. The Spotify solution uses traffic analytics to track the 30,000 most active routes (representing 6% of the full routing table) and push them into hardware. Based on Spotify's experience, offloading the active 30,000 routes to the switch provides hardware routing for 99% of their traffic.

David is interviewed by Ivan Pepelnjak,  SDN ROUTER @ SPOTIFY ON SOFTWARE GONE WILD. The SDN Internet Router (SIR) source code and documentation is available on GitHub.
The diagram from David's talk shows the overall architecture of the solution. Initially the Internet Router (commodity switch hardware) uses a default route to direct outbound traffic to a Transit Provider (capable of handling all the outbound traffic). The BGP Controller learns routes via BGP and observes traffic using the standard sFlow measurement technology embedded with most commodity switch silicon.
After a period (1 hour) the BGP Controller identifies the most active 30,000 prefixes and Continue reading

WAN optimization using real-time traffic analytics

TATA Consultancy Services white paper, Actionable Intelligence in the SDN Ecosystem: Optimizing Network Traffic through FRSA, demonstrates how real-time traffic analytics and SDN can be combined to perform real-time traffic engineering of large flows across a WAN infrastructure.
The architecture being demonstrated is shown in the diagram (this diagram has been corrected - the diagram in the white paper incorrectly states that sFlow-RT analytics software uses a REST API to poll the nodes in the topology. In fact, the nodes stream telemetry using the widely supported, industry standard, sFlow protocol, providing real-time visibility and scaleability that would be difficult to achieve using polling - see Push vs Pull).

The load balancing application receives real-time notifications of large flows from the sFlow-RT analytics software and programs the SDN Controller (in this case OpenDaylight) to push forwarding rules to the switches to direct the large flows across a specific path. Flow Aware Real-time SDN Analytics (FRSA) provides an overview of the basic ideas behind large flow traffic engineering that inspired this use case.

While OpenDaylight is used in this example, an interesting alternative for this use case would be the ONOS SDN controller running the Segment Routing application. ONOS Continue reading

Optimizing software defined data center

The recent Fortune magazine article, Software-defined data center market to hit $77.18 billion by 2020, starts with the quote "Data centers are no longer just about all the hardware gear you can stitch together for better operations. There’s a lot of software involved to squeeze more performance out of your hardware, and all that software is expected to contribute to a burgeoning new market dubbed the software-defined data center."

The recent ONS2015 Keynote from Google's Amin Vahdat describes how Google builds large scale software defined data centers. The presentation is well worth watching in its entirety since Google has a long history of advancing distributed computing with technologies that have later become mainstream.
There are a number of points in the presentation that relate to the role of networking to the performance of cloud applications. Amin states, "Networking is at this inflection point and what computing means is going to be largely determined by our ability to build great networks over the coming years. In this world data center networking in particular is a key differentiator."

This slide shows the the large pools of storage and compute connected by the data center network that are used Continue reading

Leaf and spine traffic engineering using segment routing and SDN


The short 3 minute video is a live demonstration showing how software defined networking (SDN) can be used to orchestrate the measurement and control capabilities of commodity data center switches to automatically load balance traffic on a 4 leaf, 4 spine, 10 Gigabit leaf and spine network.
The diagram shows the physical layout of the demonstration rack. The four logical racks with their servers and leaf switches are combined in a single physical rack, along with the spine switches, and SDN controllers. All the links in the data plane are 10G and sFlow has been enabled on every switch and link with the following settings, packet sampling rate 1-in-8192 and counter polling interval 20 seconds. The switches have been configured to send the sFlow data to sFlow-RT analytics software running on Controller 1.

The switches are also configured to enable OpenFlow 1.3 and connect to multiple controllers in the redundant ONOS SDN controller cluster running on Controller 1 and Controller 2.
The charts from The Nature of Datacenter Traffic: Measurements & Analysis show data center traffic measurements published by Microsoft. Most traffic flows are short duration. However, combined they consume less bandwidth than a much smaller number of Continue reading

Analytics and SDN

Recent presentations from AT&T and Google describe SDN/NFV architectures that incorporate measurement based feedback in order to improve performance and reliability.

The first slide is from a presentation by AT&T's Margaret Chiosi; SDN+NFV Next Steps in the Journey, NFV World Congress 2015. The future architecture envisions generic (white box) hardware providing a stream of analytics which are compared to policies and used to drive actions to assure service levels.


The second slide is from the presentation by Google's Bikash Koley at the Silicon Valley Software Defined Networking Group Meetup. In this architecture, "network state changes observed by analyzing comprehensive time-series data stream." Telemetry is used to verify that the network is behaving as intended, identifying policy violations so that the management and control planes can apply corrective actions. Again, the software defined network is built from commodity white box switches.

Support for standard sFlow measurements is almost universally available in commodity switch hardware. sFlow agents embedded within network devices continuously stream measurements to the SDN controller, supplying the analytics component with the comprehensive, scaleable, real-time visibility needed for effective control.

SDN fabric controller for commodity data center switches describes the measurement and control capabilities available in commodity switch hardware. Continue reading

Big Tap sFlow: Enabling Pervasive Flow-level Visibility


Today's Big Switch Networks webinar, Big Tap sFlow: Enabling Pervasive Flow-level Visibility, describes how Big Switch uses software defined networking (SDN) to control commodity switches and deliver network visibility. The webinar presents a live demonstration showing how real-time sFlow analytics is used to automatically drive SDN actions to provide a "smarter way to find a needle in a haystack."

The video presentation covers the following topics:

  • 0:00 Introduction to Big Tap
  • 7:00 sFlow generation and use cases
  • 12:30 Demonstration of real-time tap triggering based on sFlow

The webinar describes how the network wide monitoring provided by industry standard sFlow instrumentation complements the Big Tap SDN controller's ability to capture and direct packet selected packet streams to visibility tools.

The above slide from the webinar draws an analogy for the role that sFlow plays in targeting the capture network to that of a finderscope, the small, wide-angle telescope used to provide an overview of the sky and guide the telescope to its target. Support for the sFlow measurement standard is built into commodity switch hardware and is enabled on all ports in the capture network to provide a wide angle view of all traffic in the data center. Once Continue reading

OpenNetworking.tv interview


The OpenNetworking.tv interview includes a wide ranging discussion of current trends in the software defined networking (SDN), including: merchant silicon, analytics, probes, scaleability, Open vSwitch, network virtualization, VxLAN, network function virtualization (NFV),  Open Compute Project, white box / bare metal switches, leaf and spine topologies, large "Elephant" flow marking and steering, Cumulus Linux, Big Switch, orchestration, Puppet and Chef.

The interview and full transcript are available on SDxCentral: sFlow Creator Peter Phaal On Taming The Wilds Of SDN & Virtual Networking

Related articles on this blog include:

ECMP visibility with Cumulus Linux

Demo: Implementing the Big Data Design Guide in the Cumulus Workbench  is a great demonstration of the power of zero touch provisioning and automation. When the switches and servers boot they automatically pick up their operating systems and configurations for the complex Equal Cost Multi-Path (ECMP) routed network shown in the diagram.

Topology discovery with Cumulus Linux looked at an alternative Multi-Chassis Link Aggregation (MLAG) configuration and shows how to extract the configuration and monitor traffic on the network using sFlow and Fabric View.

The paper Hedera: Dynamic Flow Scheduling for Data Center Networks describes the impact of colliding flows on effective ECMP cross sectional bandwidth. The paper gives an example which demonstrates that effective cross sectional bandwidth can be reduced by a factor of between 20% to 60%, depending on the number of simultaneous flows per host.

This article uses the workbench to demonstrate the effect of large "Elephant" flow collisions on network throughput. The following script running on each of the servers uses the iperf tool to generate pairs of overlapping Elephant flows:
cumulus@server1:~$ while true; do iperf -c 10.4.2.2 -t 20; sleep 20; done
------------------------------------------------------------
Client connecting to 10.4.2.2, TCP port Continue reading

Topology discovery with Cumulus Linux

Demo: Implementing the OpenStack Design Guide in the Cumulus Workbench is a great demonstration of the power of zero touch provisioning and automation. When the switches and servers boot they automatically pick up their operating systems and configurations for the complex network shown in the diagram.
REST API for Cumulus Linux ACLs describes a REST server for remotely controlling ACLs on Cumulus Linux. This article will discuss recently added topology discovery methods that allow an SDN controller to learn topology and apply targeted controls (e.g Large "Elephant" flow marking, Large flow steering, DDoS mitigation, etc.).

Prescriptive Topology Manager

Complex Topology and Wiring Validation in Data Centers describes how Cumulus Networks' prescriptive topology manager (PTM) provides a simple method of verifying and enforcing correct wiring topologies.

The following REST call converts the topology from PTM's dot notation and returns a JSON representation:
cumulus@wbench:~$ curl http://leaf1:8080/ptm
Returns the result:
{
"links": {
"L1": {
"node1": "leaf1",
"node2": "spine1",
"port1": "swp1s0",
"port2": "swp49"
},
...
}
}

LLDP

Prescriptive Topology Manager is preferred since it ensures that the discovered topology is correct. However, PTM builds on basic Link Level Discovery Protocol (LLDP), which provides an alternative method of topology Continue reading

Broadcom ASIC table utilization metrics, DevOps, and SDN

Figure 1: Two-Level Folded CLOS Network Topology Example
Figure 1 from the Broadcom white paper, Engineered Elephant Flows for Boosting Application Performance in Large-Scale CLOS Networks, shows a data center leaf and spine topology. Leaf and spine networks are seeing rapid adoption since they provide the scaleability needed to cost effectively deliver the low latency, high bandwidth interconnect for cloud, big data, and high performance computing workloads.

Broadcom Trident ASICs are popular in white box, brite-box and branded data center switches from a wide range of vendors, including: Accton, Agema, Alcatel-Lucent, Arista, Cisco, Dell, Edge-Core, Extreme, Hewlett-Packard, IBM, Juniper, Penguin Computing, and Quanta.
Figure 2: OF-DPA Programming Pipeline for ECMP
Figure 2 shows the packet processing pipeline of a Broadcom ASIC. The pipeline consists of a number of linked hardware tables providing bridging, routing, access control list (ACL), and ECMP forwarding group functions. Operations teams need to be able to proactively monitor table utilizations in order to avoid performance problems associated with table exhaustion.

Broadcom's recently released sFlow specification, sFlow Broadcom Switch ASIC Table Utilization Structures, leverages the industry standard sFlow protocol to offer scaleable, multi-vendor, network wide visibility into the utilization of these hardware tables.

Support for Continue reading

Cloud analytics

Librato is an example of a cloud based analytics service (now part of SolarWinds). Librato provides an easy to use REST API for pushing metrics into their cloud service. The web portal makes it simple to combine and trend data and build and share dashboards.

This article describes a proof of concept demonstrating how Librato's cloud service can be used to cost effectively monitor large scale cloud infrastructure by leveraging standard sFlow instrumentation. Librato offers a free 30 day trial, making it easy to evaluate solutions based on this demonstration.
The diagram shows the measurement pipeline. Standard sFlow measurements from hosts, hypervisors, virtual machines, containers, load balancers, web servers and network switches stream to the sFlow-RT real-time analytics engine. Metrics are pushed from sFlow-RT to Librato using the REST API.

Over 40 vendors implement the sFlow standard and compatible products are listed on sFlow.org. The open source Host sFlow agent exports standard sFlow metrics from hosts. For additional background, the Velocity conference talk provides an introduction to sFlow and case study from a large social networking site.


Librato's service is priced based on the number of data points that they need to store. For example, a Host sFlow agent Continue reading

Fabric visibility with Arista EOS

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance.

Fabric View is free to try, just register at http://www.myinmon.com/ and request an evaluation. The software requires an accurate network topology in order to characterize performance and this article will describe how to obtain the topology from a fabric of Arista Networks switches.

Arista EOS™ includes the eAPI JSON-RPC service for programmatic monitoring and control. The article Arista Continue reading

Open vSwitch performance monitoring

Credit: Accelerating Open vSwitch to “Ludicrous Speed”
Accelerating Open vSwitch to "Ludicrous Speed" describes the architecture of Open vSwitch. When a packet arrives, the OVS Kernel Module checks its cache to see if there is an entry that matches the packet. If there is a match then the packet is forwarded within the kernel. Otherwise, the packet is sent to the user space ovs-vswitchd process to determine the forwarding decision based on the set of OpenFlow rules that have been installed or, if no rules are found, by passing the packet to an OpenFlow controller. Once a forwarding decision has been made, the packet and the forwarding actions are passed back to the OVS Kernel Module which caches the decision and forwards the packet. Subsequent packets in the flow will then be matched by the cache and forwarded within the kernel.

The recent Open vSwitch 2014 Fall Conference included the talk, Managing Open vSwitch across a large heterogeneous fleet by Chad Norgan, describing Rackspace's experience with running a large scale OpenStack deployment using Open vSwitch for network virtualization. The talk describes the key metrics that Rackspace collects to monitor the performance of the large pools of Open vSwitch instances.

Continue reading

OpenFlow integration

Northbound APIs for traffic engineering describes how sFlow and OpenFlow provide complementary monitoring and control capabilities that can be combined to create software defined networking (SDN) solutions that automatically adapt the network to changing traffic and address high value use cases such as: DDoS mitigation, enforcing black lists, ECMP load balancing, and packet brokers.

The article describes the challenge of mapping between the different methods used by sFlow and OpenFlow to identify switch ports:
  • Agent IP address ⟷ OpenFlow switch ID
  • SNMP ifIndex ⟷ OpenFlow port ID
The recently published sFlow OpenFlow Structures extension addresses the challenge by providing a way for switches to export the mapping as an sFlow structure.

The Open vSwitch recently implemented the extension, unifying visibility and control of the virtual network edge. In addition, most physical that support OpenFlow also support sFlow. Ask vendors about their plans to implement the sFlow OpenFlow Structures extension since it is a key enabler for SDN control applications.

Hybrid OpenFlow ECMP testbed


SDN fabric controller for commodity data center switches describes how the real-time visibility and hybrid control capabilities of commodity data center switches can be used to automatically adapt the network to changing traffic patterns and optimize performance. The article identifies hybrid OpenFlow as a critical component of the solution, allowing SDN to be combined with proven distributed routing protocols (e.g. BGP, ISIS, OSPF, etc) to deliver scaleable, production ready solutions that fully leverage the capabilities of commodity hardware.

This article will take the example of large flow marking that has been demonstrated using physical switches and show how Mininet can be used to emulate hybrid control of data center networks and deliver realistic results.
The article Elephant Detection in Virtual Switches & Mitigation in Hardware describes a demonstration by VMware and Cumulus Networks that shows how real-time detection and marking of large "Elephant" flows can dramatically improve application response time for small latency sensitive "Mouse" flows without impacting the throughput of the Elephants - see Marking large flows for additional background.
Performance optimizing hybrid OpenFlow controller demonstrated how hybrid OpenFlow can be used to mark Elephant flows on a top of rack switch. However, building test networks with physical Continue reading

REST API for Cumulus Linux ACLs

RESTful control of Cumulus Linux ACLs included a proof of concept script that demonstrated how to remotely control iptables entries in Cumulus Linux.  Cumulus Linux in turn converts the standard Linux iptables rules into the hardware ACLs implemented by merchant silicon switch ASICs to deliver line rate filtering.

Previous blog posts demonstrated how remote control of Cumulus Linux ACLs can be used for DDoS mitigation and Large "Elephant" flow marking.

A more advanced version of the script is now available on GitHub:

https://github.com/pphaal/acl_server/

The new script adds the following features:
  1. It now runs as a daemon.
  2. Exceptions generated by cl-acltool are caught and handled
  3. Rules are compiled asynchronously, reducing response time of REST calls
  4. Updates are batched, supporting hundreds of operations per second
The script doesn't provide any security, which may be acceptable if access to the REST API is limited to the management port, but is generally unacceptable for production deployments.

Fortunately, Cumulus Linux is a open Linux distribution that allows additional software components to be installed. Rather than being forced to add authentication and encryption to the script, it is possible to install additional software and leverage the capabilities of a mature web server such as Apache. Continue reading

Fabric visibility with Cumulus Linux

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

The 2 minute video provides an overview of some of the performance challenges with leaf and spine fabrics and demonstrates Fabric View - a monitoring solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time visibility into fabric performance.

Fabric View is free to try, just register at http://www.myinmon.com/ and request an evaluation. The software requires an accurate network topology in order to characterize performance and this article will describe how to obtain the topology from a Cumulus Networks fabric.

Complex Topology and Wiring Validation in Data Centers describes how Cumulus Networks' prescriptive topology manager (PTM) provides Continue reading

DDoS flood protection


Denial of Service attacks represents a significant impact to on-going operations of many businesses. When most revenue is derived from on-line operation, a DDoS attack can put a company out of business. There are many flavors of DDoS attacks, but the objective is always the same: to saturate a resource, such as a router, switch, firewall or web server, with multiple simultaneous and bogus requests, from many different sources. These attacks generate large volumes of traffic, 100Gbit/s attacks are now common, making mitigation a challenge.

The 3 minute video demonstrates Flood Protect - a DDoS mitigation solution that leverages industry standard sFlow instrumentation in commodity data center switches to provide real-time detection and mitigation of DDoS attacks. Flood Protect is an application running on InMon's Switch Fabric Accelerator SDN controller. Other applications provide visibility and accelerate fabric performance applying controls reduce latency and increase throughput.
An early version of Flood Protect won the 2014 SDN Idol competition in a joint demonstration with Brocade Networks.
Visit sFlow.com to learn more, evaluate pre-release versions of these products, or discuss requirements.

Stop thief!

The Host-sFlow project recently added added CPU steal to the set of CPU metrics exported.
steal (since Linux 2.6.11)
(8) Stolen time, which is the time spent in other operating systems
when running in a virtualized environment
Keeping close track of the stolen time metric is particularly import when running managing virtual machines in a public cloud. For example, Netflix and Stolen Time includes the discussion:
So how does Netflix handle this problem when using Amazon’s Cloud? Adrian admits that they tracked this statistic so closely that when an instance crossed a stolen time threshold the standard operating procedure at Netflix was to kill the VM and start it up on a different hypervisor. What Netflix realized over time was that once a VM was performing poorly because another VM was crashing the party, usually due to a poorly written or compute intensive application hogging the machine, it never really got any better and their best learned approach was to get off that machine.
The following articles describe how to monitor public cloud instances using Host sFlow agents:
The CPU steal metric is particularly relevant to Network Function Virtualization (NFV). Virtual Continue reading