Peter, Author at NetworkingNexus.net

Peter

Author Archives: Peter

Collecting Docker Swarm service metrics

This article demonstrates how to address the challenge of monitoring dynamic Docker Swarm deployments and track service performance metrics using existing on-premises and cloud monitoring tools like Ganglia, Graphite, InfluxDB, Grafana, SignalFX, Librato, etc.

In this example, Docker Swarm is used to deploy a simple web service on a four node cluster:

docker service create --replicas 2 -p 80:80 --name apache httpd:2.4

Next, the following script tests the agility of monitoring systems by constantly changing the number of replicas in the service:

#!/bin/bash
while true
do
  docker service scale apache=$(( ( RANDOM % 20 )  + 1 ))
  sleep 30
done

The above test is easy to set up and is a quick way to stress test monitoring systems and reveal accuracy and performance problems when they are confronted with container workloads.

Many approaches to gathering and recording metrics were developed for static environments and have a great deal of difficulty tracking rapidly changing container-based service pools without missing information, leaking resources, and slowing down. For example, each new container in Docker Swarm has unique name, e.g. apache.16.17w67u9157wlri7trd854x6q0. Monitoring solutions that record container names, or even worse, index data by container name, will suffer from bloated Continue reading

Docker 1.12 swarm mode elastic load balancing

Docker Built-In Orchestration Ready For Production: Docker 1.12 Goes GA describes the native swarm mode feature that integrates cluster management, virtual networking, and policy based deployment of services.

This article will demonstrate how real-time streaming telemetry can be used to construct an elastic load balancing solution that dynamically adjusts service capacity to match changing demand.

Getting started with swarm mode describes the steps to configure a swarm cluster. For example, following command issued on any of the Manager nodes deploys a web service on the cluster:

docker service create --replicas 2 -p 80:80 --name apache httpd:2.4

And the following command raises the number of containers in the service pool from 2 to 4:

docker service scale apache=4

Asynchronous Docker metrics describes how sFlow telemetry provides the real-time visibility required for elastic load balancing. The diagram shows how streaming telemetry allows the sFlow-RT controller to determine the load on the service pool so that it can use the Docker service API to automatically increase or decrease the size of the pool as demand changes. Elastic load balancing of the service pools ensures consistent service levels by adding additional resources if demand increases. In addition, efficiency is improved by releasing resources Continue reading

Asynchronous Docker metrics

Docker allows large numbers of lightweight containers can be started and stopped within seconds, creating an agile infrastructure that can rapidly adapt to changing requirements. However, the rapidly changing populating of containers poses a challenge to traditional methods of monitoring which struggle to keep pace with the changes. For example, periodic polling methods take time to detect new containers and can miss short lived containers entirely.

This article describes how the latest version of the Host sFlow agent is able to track the performance of a rapidly changing population of Docker containers and export a real-time stream of standard sFlow metrics.

The diagram above shows the life cycle status events associated with a container. The Docker Remote API provides a set of methods that allow the Host sFlow agent to communicate with the Docker to list containers and receive asynchronous container status events. The Host sFlow agent uses the events to keep track of running containers and periodically exports cpu, memory, network and disk performance counters for each container.

The diagram at the beginning of this article shows the sequence of messages, going from top to bottom, required to track a container. The Host sFlow agent first registers for container Continue reading

Triggered remote packet capture using filtered ERSPAN

Packet brokers are typically deployed as a dedicated network connecting network taps and SPAN/mirror ports to packet analysis applications such as Wireshark, Snort, etc.

Traditional hierarchical network designs were relatively straightforward to monitor using a packet broker since traffic flowed through a small number of core switches and so a small number of taps provided network wide visibility. The move to leaf and spine fabric architectures eliminates the performance bottleneck of core switches to deliver low latency and high bandwidth connectivity to data center applications. However, traditional packet brokers are less attractive since spreading traffic across many links with equal cost multi-path (ECMP) routing means that many more links need to be monitored.

This article will explore how the remote Selective Spanning capability in Cumulus Linux 3.0 combined with industry standard sFlow telemetry embedded in commodity switch hardware provides a cost effective alternative to traditional packet brokers.

Cumulus Linux uses iptables rules to specify packet capture sessions. For example, the following rule forwards packets with source IP 20.0.1.0 and destination IP 20.0.1.2 to a packet analyzer on host 20.0.2.2:

-A FORWARD --in-interface swp+ -s 20.0.0.2 -d 20. Continue reading

Real-time web analytics

The diagram shows a typical scale out web service with a load balancer distributing requests among a pool of web servers. The sFlow HTTP Structures standard is supported by commercial load balancers, including F5 and A10, and open source load balancers and web servers, including HAProxy, NGINX, Apache, and Tomcat.

The simplest way to try out the examples in this article is to download sFlow-RT and install the Host sFlow agent and Apache mod-sflow instrumentation on a Linux web server.

The following sFlow-RT metrics report request rates based on the standard sFlow HTTP counters:

http_method_option
http_method_get
http_method_head
http_method_post
http_method_put
http_method_delete
http_method_trace
http_method_connect
http_method_other
http_status_1xx
http_status_2xx
http_status_3xx
http_status_4xx
http_status_5xx
http_status_other
http_requests

In addition, mod-sflow exports the following standard thread pool metrics:

workers_active
workers_idle
workers_max
workers_utilization
req_delayed
req_dropped

Cluster performance metrics describes how sFlow-RT's REST API is used to compute summary statistics for a pool of servers. For example, the following query calculates the cluster wide total request rates:

http://localhost:8008/metric/ALL/sum:http_method_get,sum:http_method_post/json

More interesting is that the sFlow telemetry stream also includes randomly sampled HTTP request records with the following attributes:

protocol
serveraddress
serveraddress6
serverport
clientaddress
clientaddress6
clientport
proxyprotocol
proxyserveraddress
proxyserveraddress6
proxyserverport
proxyclientaddress
proxyclientaddress6
proxyclientport
httpmethod
httpprotocol
httphost
httpuseragent
httpxff
httpauthuser
httpmimetype
httpurl
httpreferer
httpstatus
Continue reading

Network and system analytics as a Docker service

The diagram shows how new and existing cloud based or locally hosted orchestration, operations, and security tools can leverage the sFlow-RT analytics service to gain real-time visibility. Network visibility with Docker describes how to install open source sFlow agents to monitor network activity in a Docker environment in order to gain visibility into Docker Microservices.

The sFlow-RT analytics software is now on Docker Hub, making it easy to deploy real-time sFlow analytics as a Docker service:

docker run -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt

Configure standard sFlow Agents to stream telemetry to the analyzer and retrieve analytics using the REST API on port 8008.

Increase memory from default 1G to 2G:

docker run -e "RTMEM=2G" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt

Set System Property to enable country lookups when Defining Flows:

docker run -e "RTPROP=-Dgeo.country=resources/config/GeoIP.dat" -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt

Run sFlow-RT Application. Drop the -d option while developing an application to see output of logging commands and use control-c to stop the container.

docker run -v /Users/pp/my-app:/sflow-rt/app/my-app -p 8008:8008 -p 6343:6343/udp -d sflow/sflow-rt

A simple Dockerfile can be used to generate a new image that includes the application:

FROM sflow/sflow-rt:latest
COPY /Users/pp/my-app /sflow-rt/app

Similarly, Continue reading

Internet router using Cumulus Linux

Internet router using merchant silicon describes how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. This article will describe the steps needed to install the software on an x86 based white box switch running Cumulus Linux 3.0.

First, add the Debian Jessie repository:

sudo sh -c 'echo "deb http://ftp.us.debian.org/debian jessie main contrib" > \
/etc/apt/sources.list.d/deb.list'

Next, install Host sFlow, Java, and Bird:

sudo apt-get update
sudo apt-get install hsflowd
sudo apt-get install unzip
sudo apt-get install default-jre-headless
sudo apt-get install bird

Install sFlow-RT (the latest version is available at sFlow-RT.com):

wget http://www.inmon.com/products/sFlow-RT/sflow-rt_2.0-1116.deb
sudo dpkg -i sflow-rt_2.0-1116.deb

Increase the default virtual memory limit for sflowrt (needs to be greater than 1/3 amount of RAM on system to start Java virtual machine, see Giant Bug: Cannot run java with a virtual mem limit (ulimit -v)):

sudo sh -c 'echo "sflowrt soft as 2000000" > \
/etc/security/limits.d/99-sflowrt.conf'

Note: Maximum Java heap memory has a default of 1G and is controlled by settings in /usr/local/sflow-rt/conf.d/sflow-rt.jvm file.

Install the Active Route Manager application:

sudo sh -c "/usr/local/sflow-rt/get-app. Continue reading

World map

World Map has been released on GitHub, https://github.com/sflow-rt/world-map. The application displays an up to the second view of traffic as animated bubbles overlaid on a world map.

Download and install sFlow-RT to run the world-map application. Enable the System Property, geo.country=resources/config/GeoIP.dat, to allow the application to identify countries based on IP addresses.

Internet router using merchant silicon

SDN router using merchant silicon top of rack switch and Dell OS10 SDN router demo discuss how an inexpensive white box switch running Linux can be used to replace a much costlier Internet router. The key to this solution is the observation that, while the full Internet routing table of over 600,000 routes is too large to fit in white box switch hardware, only a small fraction of the routes carry most of the traffic. Traffic analytics allows the active routes to be identified and installed in the hardware.

This article describes a simple self contained solution that uses standard APIs and should be able to run on a variety of Linux based network operating systems, including: Cumulus Linux, Dell OS10, Arista EOS, and Cisco NX-OS. The distinguishing feature of this solution is its real-time response, where previous solutions respond to changes in traffic within minutes or hours, this solution updates hardware routes within seconds.

The diagram shows the elements of the solution. Standard sFlow instrumentation embedded in the merchant silicon ASIC data plane in the white box switch provides real-time information on traffic flowing through the switch. The sFlow agent is configured to send the sFlow to an instance Continue reading

Network, host, and application monitoring for Amazon EC2

Microservices describes how visibility into network traffic is the key to monitoring, managing and securing applications that are composed of large numbers of communicating services running in virtual machines or containers.

Amazon Virtual Private Cloud (VPC) Flow Logs can be used to monitor network traffic:

However, there are limitations on the types of traffic that are logged, a 10-15 minute delay in accessing flow records, and costs associated with using VPC and storing the logs in CloudWatch (currently $0.50 per GB ingested, $0.03 per GB archived per month, and possible addition Data Transfer OUT charges).

In addition, collecting basic host metrics at 1 minute granularity using CloudWatch is an additional $3.50 per instance per month.

The open source Host sFlow agent offers an alternative:

Lightweight, requiring minimal CPU and memory on EC2 instances.
Real-time, up to the second network visibility
Efficient, export of extensive set of host metrics every 10-60 seconds (configurable).

This article will demonstrate how to install Host sFlow on an Amazon Linux instance:

$ cat /etc/issue
Amazon Linux AMI release 2016.03

The following commands build the latest Continue reading

Real-time BGP route analytics

The diagram shows how sFlow-RT real-time analytics software can combine BGP route information and sFlow telemetry to generate route analytics. Merging sFlow traffic with BGP route data significantly enhances both data streams:

sFlow real-time traffic data identifies active BGP routes
BGP path attributes are available in flow definitions

The following example demonstrates how to configure sFlow / BGP route analytics. In this example, the switch IP address is 10.0.0.253, the router IP address is 10.0.0.254, and the sFlow-RT address is 10.0.0.162.

Setup

First download sFlow-RT. Next create a configuration file, bgp.js, in the sFlow-RT home directory with the following contents:

var reflectorIP  = '10.0.0.254';
var myAS         = '65162';
var myID         = '10.0.0.162';
var sFlowAgentIP = '10.0.0.253';

// allow BGP connection from reflectorIP
bgpAddNeighbor(reflectorIP,myAS,myID);

// direct sFlow from sFlowAgentIP to reflectorIP routing table
// calculate a 60 second moving average byte rate for each route
bgpAddSource(sFlowAgentIP,reflectorIP,60,'bytes');

The following sFlow-RT System Properties load the configuration file and enable BGP:

script.file=bgp.js
bgp.start=yes

Start sFlow-RT and the following log lines will confirm that BGP has been enabled and configured:

 Continue reading

Configuring OpenSwitch

The following configuration enables sFlow monitoring of all interfaces on a white box switch running the OpenSwitch operating system, sampling packets at 1-in-4096, polling counters every 20 seconds and sending the sFlow to an analyzer (10.0.0.50) on UDP port 6343 (the default sFlow port):

switch(config)# sflow collector 10.0.0.50
switch(config)# sflow sampling 4096
switch(config)# sflow polling 20
switch(config)# sflow enable

A previous posting discussed the selection of sampling rates. Additional information can be found in the OpenSwitch sFlow User Guide.

See Trying out sFlow for suggestions on getting started with sFlow monitoring and reporting.

Cisco Tetration analytics

Cisco Tetration Analytics: the most Comprehensive Data Center Visibility and Analysis in Real Time, at Scale, June 15, 2016, announced the new Cisco Tetration Analytics platform. The platform collects telemetry from proprietary agents on servers and embedded in hardware on certain Nexus 9k switches, analyzes the data, and presents results via Web GUI, REST API, and as events.

Cisco Tetration Analytics Data Sheet describes the hardware requirements:

Platform Hardware	Quantity
Cisco Tetration Analytics computing nodes (servers)	16
Cisco Tetration Analytics base nodes (servers)	12
Cisco Tetration Analytics serving nodes (servers)	8
Cisco Nexus 9372PX Switches	3

And the power requirements:

Property	Cisco Tetration Analytics Platform
Peak power for Cisco Tetration Analytics Platform (39-RU single-rack option)	22.5 kW
Peak power for Cisco Tetration Analytics Platform (39-RU dual-rack option)	11.25 kW per rack (22.5 KW Total)

No pricing is given, but based on the hardware, data center space, power and cooling requirements, this brute force approach to analytics will be reassuringly expensive to purchase and operate.

Update June 22, 2016: See 451 Research report, Cisco Tetration: a $3m, 1,700-pound appliance for network traffic analytics is born, for pricing information.

A much less expensive alternative is to use industry Continue reading

Programmable hardware: Barefoot Networks, PISA, and P4

Barefoot Networks recently came out of stealth to reveal their Tofino 6.5Tbit/second (65 X 100GE or 260 X 25GE) fully user-programmable switch. The diagram above, from the talk Programming The Network Data Plane by Changhoon Kim of Barefoot Networks, shows the Protocol Independent Switch Architecture (PISA) of the programmable switch silicon.

A logical switch data-plane described in the P4 language is compiled to program the general purpose PISA hardware. For example, the following P4 code snippet is part of a P4 sFlow implementation:

table sflow_ing_take_sample {
    /* take_sample > MAX_VAL_31 and valid sflow_session_id => take the sample */
    reads {
        ingress_metadata.sflow_take_sample : ternary;
        sflow_metadata.sflow_session_id : exact;
    }
    actions {
        nop;
        sflow_ing_pkt_to_cpu;
    }
}

Network visibility is one of the major use cases for P4 based switches. Improving Network Monitoring and Management with Programmable Data Planes describes how P4 can be used to collect information about latency and queueing in the switch forwarding pipeline.

The document also describes an architecture for In-band Network Telemetry (INT) in which the ingress switch is programmed to insert a header containing measurements to packets entering the network. Each switch in the path is programmed to append additional measurements to the packet header. The Continue reading

Merchant silicon based routing, flow analytics, and telemetry

Drivers for growth describes how switches built on merchant silicon from Broadcom ASICs dominate the current generation of data center switches, reduce hardware costs, and support an open ecosystem of switch operating systems (Cumulus Linux, OpenSwitch, Dell OS10, Broadcom FASTPATH, Pica8 PicOS, Open Network Linux, etc.).

The router market is poised to be similarly disrupted with the introduction of devices based on Broadcom's Jericho ASIC, which has the capacity to handle over 1 million routes in hardware (the full Internet routing table is currently around 600,000 routes).

An edge router is a very pricey box indeed, often costing anywhere from $100,000 to $200,000 per 100 Gb/sec port, depending on features in the router and not including optical cables that are also terribly expensive. Moreover, these routers might only be able to cram 80 ports into a half rack or full rack of space. The 7500R universal spine and 7280R universal leaf switches cost on the order of $3,000 per 100 Gb/sec port, and they are considerably denser and less expensive. - Leaving Fixed Function Switches Behind For Universal Leafs

Broadcom Jericho ASICs are currently available in Arista 7500R/7280R routers and in Cisco NCS 5000 series routers. Expect further disruption Continue reading

Docker networking with IPVLAN and Cumulus Linux

Macvlan and Ipvlan Network Drivers are being added as Docker networking options. The IPVlan L3 Mode shown in the diagram is particularly interesting since it dramatically simplifies the network by extending routing to the hosts and eliminating switching entirely.

Eliminating the complexity associated with switching broadcast domains, VLANs, spanning tree, etc. allows a purely routed network to be easily scaled to very large sizes. However, there are some challenges to overcome:

IPVlan will require routes to be distributed to each endpoint. The driver only builds the Ipvlan L3 mode port and attaches the container to the interface. Route distribution throughout a cluster is beyond the initial implementation of this single host scoped driver. In L3 mode, the Docker host is very similar to a router starting new networks in the container. They are on networks that the upstream network will not know about without route distribution.

Cumulus Networks has been working to simplify routing in the ECMP leaf and spine networks and the white paper Routing on the Host: An Introduction shows how the routing configuration used on Cumulus Linux can be extended to the hosts.

Update June 2, 2016: Routing on the Host contains packaged versions of the Continue reading

Streaming telemetry

The OpenConfig project has been getting a lot of attention lately. A number of large network operators, lead by Google, are developing "a consistent set of vendor-neutral data models (written in YANG) based on actual operational needs from use cases and requirements from multiple network operators."

The OpenConfig project extends beyond configuration, "Streaming telemetry is a new paradigm for network monitoring in which data is streamed from devices continuously with efficient, incremental updates. Operators can subscribe to the specific data items they need, using OpenConfig data models as the common interface."

Anees Shaikh's Network Field Day talk provides an overview of OpenConfig and includes an example that demonstrates how configuration and state are combined in a single YANG data model. In the example, read/write config attributes used to configure a network interface (name, description, MTU, operational state) are combined with the state attributes needed to verify the configuration (MTU, name, description, oper-status, last-change) and collect metrics (in-octets, in-ucast-pkts, in-broadcast-pkts, ...).

Anees positions OpenConfig streaming telemetry mechanism as an attractive alternative to polling for metrics using Simple Network Management Protocol (SNMP) - see Push vs Pull for a detailed comparison between pushing (streaming) and pulling (polling) metrics.

Streaming telemetry is Continue reading

Internet of Things (IoT) telemetry

The internet of things (IoT) is the network of physical objects—devices, vehicles, buildings and other items—embedded with electronics, software, sensors, and network connectivity that enables these objects to collect and exchange data. - ITU

The recently released Raspberry Pi Zero (costing $5) is an example of the type of embedded low power computer enabling IoT. These small devices are typically wired to one or more sensors (measuring temperature, humidity, location, acceleration, etc.) and embedded in or attached to physical devices.

Collecting real-time telemetry from large numbers of small devices that may be located within many widely dispersed administrative domains poses a number of challenges, for example:

Discovery - How are newly connected devices discovered?
Configuration - How can the numerous individual devices be efficiently configured?
Transport - How efficiently are measurements transported and delivered?
Latency - How long does it take before measurements are remotely accessible?

This article will use the Raspberry Pi as an example to explore how the architecture of the industry standard sFlow protocol and its implementation in the open source Host sFlow agent provide a method of addressing the challenges of embedded device monitoring.

The following steps describe how to install the Host sFlow Continue reading

OVS Orbit podcast with Ben Pfaff

OVS Orbit Episode 6 is a wide ranging discussion between Ben Pfaff and Peter Phaal of the industry standard sFlow measurement protocol, implementation of sFlow in Open vSwitch, network analytics use cases and application areas supported by sFlow, including: OpenStack, Open Network Virtualization (OVN), DDoS mitigation, ECMP load balancing, Elephant and Mice flows, Docker containers, Network Function Virtualization (NFV), and microservices.

Follow the link to see listen to the podcast, read the extensive show notes, follow related links, and to subscribe to the podcast.

Raspberry Pi real-time network analytics

The Raspberry Pi model 3b is not much bigger than a credit card, costs $35, runs Linux, has a 1G RAM, and powerful 4 core 64 bit ARM processor. This article will demonstrate how to turn the Raspberry Pi into a Terribit/second real-time network analytics engine capable of monitoring hundreds of switches and thousands of switch ports.

The diagram shows how the sFlow-RT real-time analytics engine receives a continuous telemetry stream from industry standard sFlow instrumentation build into network, server and application infrastructure and delivers analytics through APIs and can easily be integrated with a wide variety of on-site and cloud, orchestration, DevOps and Software Defined Networking (SDN) tools.

A future article will examine how the Host sFlow agent can be used to efficiently stream measurements from large numbers of inexpensive Rasberry Pi devices ($5 for model Zero) to the sFlow-RT collector to monitor and control the "Internet of Things" (IoT).

The following instructions show how to install sFlow-RT on Raspbian Jesse (the Debian Linux based Raspberry Pi operating system).

wget http://www.inmon.com/products/sFlow-RT/sflow-rt_2.0-1092.deb
sudo dpkg -i --ignore-depends=openjdk-7-jre-headless sflow-rt_2.0-1092.deb

We are ignoring the dependency on openjdk and will use the default Raspbian Java 1.8 version Continue reading

« Previous 1 … 8 9 10 11 12 … 15 Next »