Peter, Author at NetworkingNexus.net

Peter

Author Archives: Peter

Using a proxy to feed metrics into Ganglia

The GitHub gmond-proxy project demonstrates how a simple proxy can be used to map metrics retrieved through a REST API into Ganglia's gmond TCP protocol.

The diagram shows the elements of the Ganglia monitoring system. The Ganglia server contains runs the gmetad daemon that polls for data from gmond instances and stores time series data. Trend charts are presented through the web interface. The transparent gmond-proxy replaces a native gmond daemon and delivers metrics in response to gmetad's polling requests.

The following commands install the proxy on the sFlow collector - an Ubuntu 14.04 system that is already runnig sFlow-RT:

wget https://raw.githubusercontent.com/sflow-rt/gmond-proxy/master/gmond_proxy.py
sudo mv gmond_proxy.py /etc/init.d/
sudo chown root:root /etc/init.d/gmond_proxy.py
sudo chmod 755 /etc/init.d/gmond_proxy.py
sudo service gmond_proxy.py start
sudo update-rc.d gmond_proxy.py start

The following commands install Ganglia's gmetad collector and web user interface on the Ganglia server - an Ubuntu 14.04 system:

sudo apt-get install gmetad
sudo apt-get install ganglia-webfrontend
cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled

Next edit the /etc/ganglia/gmetad.conf file and configure the proxy as a data source:

data_source "my cluster" sflow-rt

Restart the Apache and gmetad daemons:

sudo service gmetad restart
sudo service apache2  Continue reading

Broadcom BroadView Instrumentation

The diagram above, from the BroadView™ 2.0 Instrumentation Ecosystem presentation, illustrates how instrumentation built into the network Data Plane (the Broadcom Trident/Tomahawk ASICs used in most data center switches) provides visibility to Software Defined Networking (SDN) controllers so that they can optimize network performance.

The sFlow measurement standard provides open, scaleable, multi-vendor, streaming telemetry that supports SDN applications. Broadcom has been augmenting the rich set of counter and flow measurements in the base sFlow standard with additional metrics. For example, Broadcom ASIC table utilization metrics, DevOps, and SDN describes metrics that were added to track ASIC table resource consumption.

The highlighted Buffer congestion state / statistics capability in the slide refers to the BroadView Buffer Statistics Tracking (BST) instrumentation. The Memory Management Unit (MMU) is on-chip logic that manages how the on-chip packet buffers are organized. BST is a feature that enables tracking the usage of these buffers. It includes snapshot views of peak utilization of the on-chip buffer memory across queues, ports, priority group, service pools and the entire chip.

The above chart from the Broadcom technical brief, Building an Open Source Data Center Monitoring Tool Using Broadcom BroadView™ Instrumentation Software, shows buffer utilization trended over an Continue reading

DDoS Blackhole

DDoS Blackhole has been released on GitHub, https://github.com/sflow-rt/ddos-blackhole. The application detects Distributed Denial of Service (DDoS) flood attacks in real-time and can automatically install a null / blackhole route to drop the attack traffic and maintain Internet connectivity. See DDoS for additional background.

The screen capture above shows a simulated DNS amplification attack. The Top Targets chart is a real-time view of external traffic to on-site IP addresses. The red line indicates the threshold that has been set at 10,000 packets per second and it is clear that traffic to address 192.168.151.4 exceeds the threshold. The Top Protocols chart below shows that the increase in traffic is predominantly DNS. The Controls chart shows that a control was added the instant the traffic crossed the threshold.

The Controls tab shows a table of the currently active controls. In this case, the controller is running in Manual mode and is listed with a pending status as it awaits manual confirmation (which is why the attack traffic persists in the Charts page). Clicking on the entry brings up a form that can be used to apply the control.

The chart above from the DDoS article shows an actual attack Continue reading

OVN service injection demonstration

Enabling extensibility in OVN, by Gal Sagie, Huawei and Liran Schour, IBM, Open vSwitch 2015 Fall Conference describes a method for composing actions from an external application with actions installed by the Open Network Virtualization (OVN) controller.

An API allows services to be attached to logical topology elements in the OVN logical topology, resulting in a table in the OVN logical flow table that is under the controller of the external service. Changes to the logical table are then automatically instantiated as concrete flows in the Open vSwitch instances responsible for handling the packets in the flow.

The demo presented involves detecting large "Elephant" flows using sFlow instrumentation embedded in Open vSwitch. Once a large flow is detected, logical flows are instantiated in the OVN controller to mark the packets. The concrete marking rules are inserted in the Open vSwitch packet processing pipelines handling the logical flow's packets. In the demo, the marked packets are then diverted by the physical network to a dedicated optical circuit.

There are a number of interesting traffic control use cases described on this blog that could leverage the capabilities of Open vSwitch using this approach:

SDN control of hybrid packet / optical leaf Continue reading

Open vSwitch 2015 Fall Conference

Open vSwitch is an open source software virtual switch that is popular in cloud environments such as OpenStack. Open vSwitch is a standard Linux component that forms the basis of a number of commercial and open source solutions for network virtualization, tenant isolation, and network function virtualization (NFV) - implementing distributed virtual firewalls and routers.

The recent Open vSwitch 2015 Fall Conference agenda included a wide variety speakers addressing a range of topics, including: Open Network Virtualization (OVN), containers, service chaining, and network function virtualization (NFV).

The video above is a recording of the following sFlow related talk from the conference:

New OVS instrumentation features aimed at real-time monitoring of virtual networks (Peter Phaal, InMon)
The talk will describe the recently added packet-sampling mechanism that returns the full list of OVS actions from the kernel. A demonstration will show how the OVS sFlow agent uses this mechanism to provide real-time tunnel visibility. The motivation for this visibility will be discussed, using examples such as end-to-end troubleshooting across physical and virtual networks, and tuning network packet paths by influencing workload placement in a VM/Container environment.

This talk is a follow up to an Open vSwitch 2014 Fall Conference talk on the Continue reading

Network virtualization visibility demo

New OVS instrumentation features aimed at real-time monitoring of virtual networks, Open vSwitch 2015 Fall Conference, included a demonstration of real-time visibility into the logical network overlays created by network virtualization, virtual switches, and the leaf and spine underlay carrying the tunneled traffic between hosts.

The diagram above shows the demonstration testbed. It consists of a leaf and spine network connecting two hosts, each of which is running a pair of Docker containers connected to Open vSwitch (OVS). The vSwitches are controlled by Open Virtual Network (OVN), which has been configured to create two logical switches, the first connecting the left most containers on each host and the second connecting the right most containers. The testbed is described in more detail in Open Virtual Network (OVN) and is built from free components and can easily be replicated.

The dashboard in the video illustrates the end to end visibility that is possible by combining standard sFlow instrumentation in the physical switches with sFlow instrumentation in Open vSwitch and Host sFlow agents on the servers.

The diagram on the left of the dashboard shows a logical map of the elements in the testbed. The top panel shows the two logical switches Continue reading

SC15 live real-time weathermap

Connect to http://inmon.sc15.org/sflow-rt/app/sc15-weather/html/ between now and November 19th to see a real-time heat map of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) network.

From the SCinet web page, "SCinet brings to life a very high-capacity network that supports the revolutionary applications and experiments that are a hallmark of the SC conference. SCinet will link the convention center to research and commercial networks around the world. In doing so, SCinet serves as the platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of bandwidth-driven applications including supercomputing and cloud computing."

The real-time weathermap leverages industry standard sFlow instrumentation built into network switch and router hardware to provide scaleable monitoring of the over 6 Terrabit/s aggregate link capacity comprising the SCinet network. Link colors are updated every second to reflect operational status and utilization of each link.

Clicking on a link in the map pops up a 1 second resolution strip chart showing the protocol mix carried by the link.

The SCinet real-time weathermap was constructed using open source components running on the sFlow-RT real-time analytics engine. Download sFlow-RT and see what Continue reading

sFlow Test

sFlow Test has been released on GitHub, https://github.com/sflow-rt/sflow-test. The suite of checks is intended to validate the implementation of sFlow on a data center switch. In particular, the tests are designed to verify that the sFlow agent implementation provides measurements under load with the accuracy needed to drive SDN control applications, including:

Many of the tests can be run while the switches are in production and are a useful way of verifying that a switch is configured and operating correctly.

The stress tests can be scaled to run without specialized equipment. For example, the recommended sampling rate for 10G links in production is 1-in-10,000. Driving a switch with 48x10G ports to 30% of total capacity would require a load generator capable of generating 288Gbit/s. However, dropping the sampling rate to 1-in-100 and generating a load of 2.88Gbit/s is an equivalent test of the sFlow agent's performance and can be achieved by two moderately powerful servers with 10G network adapters.

For example, using the test setup above, run an iperf server on Server2:

iperf -su

Then run the following sequence of tests on Server1:

#!/bin/bash
RT="10.0.0. Continue reading

Active Route Manager

SDN Active Route Manager has been released on GitHub, https://github.com/sflow-rt/active-routes. The software is based on the article White box Internet router PoC. Active Route Manager peers with a BGP route reflector to track prefixes and combines routing data with sFlow measurements to identify the most active prefixes. Active prefixes can be advertised via BGP to a commodity switch, which acts as a hardware route cache, accelerating the performance of a software router.

There is an interesting parallel with the Open vSwitch architecture, see Open vSwitch performance monitoring, which maintains a cache of active flows in the Linux kernel to accelerate forwarding. In the SDN routing case, active prefixes are pushed to the switch ASIC in order to bypass the slower software router.

In this example, the software is being used in passive mode, estimating the cache hit / miss rates without offloading routes. The software has been configured to manage a cache of 10,000 prefixes. The first screen shot shows the cache warming up.

The first panel shows routes being learned from the route reflector: the upper chart shows the approximately 600,000 routes being learned from the BGP route reflector, and the lower chart shows the rate at which Continue reading

Fabric View

The Fabric View application has been released on Github, https://github.com/sflow-rt/fabric-view. Fabric View provides real-time visibility into the performance of leaf and spine ECMP fabrics.

A leaf and spine fabric is challenging to monitor. The fabric spreads traffic across all the switches and links in order to maximize bandwidth. Unlike traditional hierarchical network designs, where a small number of links can be monitored to provide visibility, a leaf and spine network has no special links or switches where running CLI commands or attaching a probe would provide visibility. Even if it were possible to attach probes, the effective bandwidth of a leaf and spine network can be as high as a Petabit/second, well beyond the capabilities of current generation monitoring tools.

Fabric View solves the visibility challenge by using the industry standard sFlow instrumentation built into most data center switches. Fabric View represents the fabric as if it were a single large chassis switch, treating each leaf switch as a line card and the spine switches as the backplane. The result is an intuitive tool that is easily understood by anyone familiar with traditional networks.

Fabric View provides real-time, second-by-second visibility to traffic, identifying top talkers, protocols, tenants, tunneled traffic, Continue reading

Real-time analytics and control applications

sFlow-RT 2.0 released - adds application support describes a new application framework for sharing solutions built on top of the real-time analytics platform. Application examples are provided on the sFlow-RT Download page.

The flow-graph application, shown above, generates a real-time graph of communication between hosts. The application uses a simple sFlow-RT script to track associations between hosts based on their communication patterns and plots the results using the vis.js dynamic, browser based visualization library. This example can be modified to track different types of relationship and extended to incorporate other popular data visualization libraries such as D3.js.

The dashboard-example includes representative real-time metric and top flows trend charts. The example uses the jQuery-UI library to build build a simple tabbed interface. This example can be extended to build groups of custom charts.

The top-flows application supports the definition of custom flows and tracks the largest flows in a continuously updating table.

Each of the examples has a server-side component that uses sFlow-RT's script API to collect, analyze, and export measurements. An HTML5 client side user interface connects to the server and presents the data.

The sFlow-RT analytics engine is a highly scaleable platform for processing sFlow measurements Continue reading

Open Virtual Network (OVN)

Open Virtual Network (OVN) is an open source network virtualization solution built as part of the Open vSwitch (OVS) project. OVN provides layer 2/3 virtual networking and firewall services for connecting virtual machines and Linux containers.

OVN is built on the same architectural principles as VMware's commercial NSX and offers the same core network virtualization capability — providing a free alternative that is likely to see rapid adoption in open source orchestration systems, Mirantis: Why the Open Virtual Network (OVN) matters to OpenStack.

This article uses OVN as an example, describing a testbed which demonstrates how the standard sFlow instrumentation build into the physical and virtual switches provides the end-to-end visibility required to manage large scale network virtualization and deliver reliable services.

Open Virtual Network

The Northbound DB provides a way to describe the logical networks that are required. The database abstracts away implementation details which are handled by the ovn-northd and ovn-controllers and presents an easily consumable network virtualization service to orchestration tools like OpenStack.

The purple tables on the left describe a simple logical switch LS1 that has two logical ports LP1 and LP2 with MAC addresses AA and BB respectively. The green tables on the right show Continue reading

Cisco adds sFlow support to Nexus 9K series

Cisco adds support for the sFlow standard in the Cisco Nexus 9000 Series 7.0(3)I2(1) NX-OS Release. Combined with the Nexus 3000/3100 series, which have included sFlow support since NX-OS 5.0(3)U4(1), Cisco now offers cost effective, built-in, visibility across the full spectrum of data center switches.

Cisco network engineers might not be familiar with the multi-vendor sFlow technology since it is a relatively new addition to Cisco products. The article, Cisco adds sFlow support, describes some of the key features of sFlow and contrasts them to Cisco NetFlow.

Nexus 9000 switches can be operated in NX-OS mode or ACI mode:

NX-OS mode includes a number of open features such as sFlow, Python, NX-API, and Bash that integrate with an open ecosystem of orchestration tools such as Puppet, Chef, CFEngine, and Ansible. "By embracing the open culture of development and operations (DevOps) and creating a more Linux-like environment in the Cisco Nexus 9000 Series, Cisco enables IT departments with strong Linux skill sets to meet business needs efficiently," Cisco Nexus 9000 Series Switches: Integrate Programmability into Your Data Center. Open APIs are becoming increasingly popular, preventing vendor lock-in, and allowing organizations to benefit from the rapidly increasing range of open hardware Continue reading

CORD: Open-source spine-leaf Fabric

Live demonstration of SDN leaf and spine traffic engineering recorded at the Open Networking Summit. The open source ONOS controller implements segment routing using OpenFlow 1.3 to control a four switch leaf and spine network of commodity switches. For more detail on the use of real-time sFlow analytics from commodity switches in this demonstrations, see Leaf and spine traffic engineering using segment routing and SDN

White box Internet router PoC

SDN router using merchant silicon top of rack switch describes how the performance of a software Internet router could be accelerated using the hardware routing capabilities of a commodity switch. This article describes a proof of concept demonstration using Linux virtual machines and a bare metal switch running Cumulus Linux.

The diagram shows the demo setup, providing inter-domain routing between Peer 1 and Peer 2. The Peers are directly connected to the Hardware Switch and ingress packets are routed by the default (0.0.0.0/0) route to the Software Router. The Software Router learns the full set of routes from the Peers using BGP and forwards the packet to the correct next hop router. The packet is then switched to the selected peer router via bridge br_xen.

The following traceroute run on Peer 1 shows the set of router hops from 192.168.250.1 to 192.168.251.1

[root@peer1 ~]# traceroute -s 192.168.250.1 192.168.251.1
traceroute to 192.168.251.1 (192.168.251.1), 30 hops max, 40 byte packets
 1  192.168.152.2 (192.168.152.2)  3.090 ms  3.014 ms  2.927 ms
 2  192.168. Continue reading

SDN router using merchant silicon top of rack switch

The talk from David Barroso describes how Spotify optimizes hardware routing on a commodity switch by using sFlow analytics to identify the routes carrying the most traffic. The full Internet routing table contains nearly 600,000 entries, too many for commodity switch hardware to handle. However, not all entries are active all the time. The Spotify solution uses traffic analytics to track the 30,000 most active routes (representing 6% of the full routing table) and push them into hardware. Based on Spotify's experience, offloading the active 30,000 routes to the switch provides hardware routing for 99% of their traffic.

David is interviewed by Ivan Pepelnjak, SDN ROUTER @ SPOTIFY ON SOFTWARE GONE WILD. The SDN Internet Router (SIR) source code and documentation is available on GitHub.

The diagram from David's talk shows the overall architecture of the solution. Initially the Internet Router (commodity switch hardware) uses a default route to direct outbound traffic to a Transit Provider (capable of handling all the outbound traffic). The BGP Controller learns routes via BGP and observes traffic using the standard sFlow measurement technology embedded with most commodity switch silicon.

After a period (1 hour) the BGP Controller identifies the most active 30,000 prefixes and Continue reading

WAN optimization using real-time traffic analytics

TATA Consultancy Services white paper, Actionable Intelligence in the SDN Ecosystem: Optimizing Network Traffic through FRSA, demonstrates how real-time traffic analytics and SDN can be combined to perform real-time traffic engineering of large flows across a WAN infrastructure.

The architecture being demonstrated is shown in the diagram (this diagram has been corrected - the diagram in the white paper incorrectly states that sFlow-RT analytics software uses a REST API to poll the nodes in the topology. In fact, the nodes stream telemetry using the widely supported, industry standard, sFlow protocol, providing real-time visibility and scaleability that would be difficult to achieve using polling - see Push vs Pull).

The load balancing application receives real-time notifications of large flows from the sFlow-RT analytics software and programs the SDN Controller (in this case OpenDaylight) to push forwarding rules to the switches to direct the large flows across a specific path. Flow Aware Real-time SDN Analytics (FRSA) provides an overview of the basic ideas behind large flow traffic engineering that inspired this use case.

While OpenDaylight is used in this example, an interesting alternative for this use case would be the ONOS SDN controller running the Segment Routing application. ONOS Continue reading

Optimizing software defined data center

The recent Fortune magazine article, Software-defined data center market to hit $77.18 billion by 2020, starts with the quote "Data centers are no longer just about all the hardware gear you can stitch together for better operations. There’s a lot of software involved to squeeze more performance out of your hardware, and all that software is expected to contribute to a burgeoning new market dubbed the software-defined data center."

The recent ONS2015 Keynote from Google's Amin Vahdat describes how Google builds large scale software defined data centers. The presentation is well worth watching in its entirety since Google has a long history of advancing distributed computing with technologies that have later become mainstream.

There are a number of points in the presentation that relate to the role of networking to the performance of cloud applications. Amin states, "Networking is at this inflection point and what computing means is going to be largely determined by our ability to build great networks over the coming years. In this world data center networking in particular is a key differentiator."

This slide shows the the large pools of storage and compute connected by the data center network that are used Continue reading

Leaf and spine traffic engineering using segment routing and SDN

The short 3 minute video is a live demonstration showing how software defined networking (SDN) can be used to orchestrate the measurement and control capabilities of commodity data center switches to automatically load balance traffic on a 4 leaf, 4 spine, 10 Gigabit leaf and spine network.

The diagram shows the physical layout of the demonstration rack. The four logical racks with their servers and leaf switches are combined in a single physical rack, along with the spine switches, and SDN controllers. All the links in the data plane are 10G and sFlow has been enabled on every switch and link with the following settings, packet sampling rate 1-in-8192 and counter polling interval 20 seconds. The switches have been configured to send the sFlow data to sFlow-RT analytics software running on Controller 1.

The switches are also configured to enable OpenFlow 1.3 and connect to multiple controllers in the redundant ONOS SDN controller cluster running on Controller 1 and Controller 2.

The charts from The Nature of Datacenter Traffic: Measurements & Analysis show data center traffic measurements published by Microsoft. Most traffic flows are short duration. However, combined they consume less bandwidth than a much smaller number of Continue reading

Analytics and SDN

Recent presentations from AT&T and Google describe SDN/NFV architectures that incorporate measurement based feedback in order to improve performance and reliability.

The first slide is from a presentation by AT&T's Margaret Chiosi; SDN+NFV Next Steps in the Journey, NFV World Congress 2015. The future architecture envisions generic (white box) hardware providing a stream of analytics which are compared to policies and used to drive actions to assure service levels.

The second slide is from the presentation by Google's Bikash Koley at the Silicon Valley Software Defined Networking Group Meetup. In this architecture, "network state changes observed by analyzing comprehensive time-series data stream." Telemetry is used to verify that the network is behaving as intended, identifying policy violations so that the management and control planes can apply corrective actions. Again, the software defined network is built from commodity white box switches.

Support for standard sFlow measurements is almost universally available in commodity switch hardware. sFlow agents embedded within network devices continuously stream measurements to the SDN controller, supplying the analytics component with the comprehensive, scaleable, real-time visibility needed for effective control.

SDN fabric controller for commodity data center switches describes the measurement and control capabilities available in commodity switch hardware. Continue reading

« Previous 1 … 10 11 12 13 14 15 Next »