An early version of Flood Protect won the 2014 SDN Idol competition in a joint demonstration with Brocade Networks.Visit sFlow.com to learn more, evaluate pre-release versions of these products, or discuss requirements.
steal (since Linux 2.6.11)Keeping close track of the stolen time metric is particularly import when running managing virtual machines in a public cloud. For example, Netflix and Stolen Time includes the discussion:
(8) Stolen time, which is the time spent in other operating systems
when running in a virtualized environment
So how does Netflix handle this problem when using Amazon’s Cloud? Adrian admits that they tracked this statistic so closely that when an instance crossed a stolen time threshold the standard operating procedure at Netflix was to kill the VM and start it up on a different hypervisor. What Netflix realized over time was that once a VM was performing poorly because another VM was crashing the party, usually due to a poorly written or compute intensive application hogging the machine, it never really got any better and their best learned approach was to get off that machine.The following articles describe how to monitor public cloud instances using Host sFlow agents:
It is possible to simply convert the raw sFlow metrics into InfluxDB metrics. The sflow2graphite.pl script provides an example that can be modified to support InfluxDB's native format, or used unmodified with the InfluxDB Graphite input plugin. However, there are scaleability advantages to placing the sFlow-RT analytics engine in front of the time series database. For example, in large scale cloud environments the metrics for each member of a dynamic pool isn't necessarily worth trending since virtual machines are frequently added and removed. Instead, sFlow-RT tracks all the Continue reading
Traffic visibility and control with sFlow (Peter Phaal, InMon)
sFlow instrumentation has been included in Open vSwitch since version 0.99.1 (released 25 Jan 2010). This talk will introduce the sFlow architecture and discuss how it differs from NetFlow/IPFIX, particularly in regards to delivering real-time flow analytics to an SDN controller. The talk will demonstrate that sFlow measurements from Open vSwitch are identical to sFlow measurements made in hardware on bare metal switches, providing unified, end-to-end, measurement across physical and virtual networks. Finally, Open vSwitch / Mininet will be used to demonstrate Continue reading
Credit: sFlow.com |
KennyK/Shutterstock |
We would like to share our experiences with Customer SDN deployments that require OpenFlow hybrid mode. Why it matters, implementation considerations, and how to achieve better support for it in ODL
OpenFlow-compliant switches come in two types: OpenFlow-only, and OpenFlow-hybrid. OpenFlow-only switches support only OpenFlow operation, in those switches all packets are processed by the OpenFlow pipeline, and cannot be processed otherwise. OpenFlow-hybrid switches support both OpenFlow operation and normal Ethernet switching operation, i.e. traditional L2 Ethernet switching, VLAN isolation, L3 routing (IPv4 routing, IPv6 routing...), ACL and QoS processing
The rationale for supporting hybrid mode is twofold:
- Controlled switches have decades of embedded traditional networking logic. The controller does not add value to a solution if it replicates traditional forwarding logic. One alternative controller responsibility is that provides forwarding decisions when it wants to override the traditional data-plane forwarding decision.
- Controllers can be gradually incorporated into a traditional network. Continue reading
It is used to collect statistics, such as packet counts, error counts, CPU usage, etc from a large number of individual switches. What is especially interesting is that it can be used to collect sampled packets (usually only the first n bytes, containing the header), along with some metadata about those packets.
Bringing sFlow to Cumulus Linux was particuarly easy, because “hsflowd” was already available for implementing sFlow support on Linux servers. We were able to reuse that existing code, with extremely minimal modification, to implement sFlow on our Linux based switches.
sFlow allows a collector to get a statistical view of what is going on in a collection of switches, approaching per-flow granularity. This is extremely useful information to present to users for capacity planning and debugging purposes, but things really get interesting when the collector can make decisions based on the information.
For example, our friends at inMon implemented detection of elephant flows (high bandwidth), followed by marking those flows on the switch at network ingress for special QoS handling. This nearly Continue reading
Figure 1: Real-time SDN Analytics for DDoS mitigation |
include('extras/json2.js');
// Define large flow as greater than 100Mbits/sec for 1 second or longer
var bytes_per_second = 100000000/8;
var duration_seconds = 1;
var id = 0;
var controls = {};
setFlow('udp_target',
{keys:'ipdestination,udpsourceport', value:'bytes',
filter:'direction=egress', t:duration_seconds}
);
setThreshold('attack',
{metric:'udp_target', value:bytes_per_second, byFlow:true, timeout:4,
filter:{ifspeed:[1000000000]}}
);
setEventHandler(function(evt) {
if(controls[evt.flowKey]) return;
var rulename = 'ddos' + id++;
var keys = evt.flowKey.split(',');
var acl = [
'[iptables]',
'# block UDP reflection attack',
'-A FORWARD --in-interface swp+ -d ' + keys[0]
+ ' -p udp --sport ' + keys[1] + ' -j DROP'
];
http('http://'+evt.agent+':8080/acl/'+rulename,
'put','application/json',JSON.stringify(acl));
controls[evt.flowKey] = {
agent:evt.agent,
dataSource:evt.dataSource,
rulename:rulename,
Continue reading
Visibility and the software defined data center |
====================================The closure summary shows that operators took 8 hour 39 minutes to manually diagnose and remediate the problem with degraded networking infrastructure. The network related outage described in this example is not an isolated incident; other incidents described on this blog include: Packet loss, Amazon EC2 outage, Gmail outage, Delay vs utilization for Continue reading
Closure Summary: On Tuesday, June 24, 2014, at approximately 1:11 PM UTC, engineers received reports of an issue in which some customers were unable to access the Exchange Online service. Investigation determined that a portion of the networking infrastructure entered into a degraded state. Engineers made configuration changes on the affected capacity to remediate end-user impact. The issue was successfully fixed on Tuesday, June 24, 2014, at 9:50 PM UTC.
Customer Impact: Affected customers were unable to access the Exchange Online service.
Incident Start Time: Tuesday, June 24, 2014, at 1:11 PM UTC
Incident End Time: Tuesday, June 24, 2014, at 9:50 PM UTC
=====================================
Figure 1: Elephants and Mice |
sudo apt-get install hsflowdNote: Network managers may find this command odd since it is usually not possible to install third party software on switch hardware. However, what is even more radical is that Cumulus Linux allows users to download source Continue reading
Figure 1: Rise of merchant silicon |
Figure 2: Fabric: A Retrospective on Evolving SDN |
Table 1: Edge vs Fabric Functionality |
Figure 1: Active control of large flows in a multi-path topology |
Figure 2: Two path topology |