Identifying bad ECMP paths
In the talk Move Fast, Unbreak Things! at the recent DevOps Networking Forum, Petr Lapukhov described how Facebook has tackled the problem of detecting packet loss in Equal Cost Multi-Path (ECMP) networks. At Facebook's scale, there are many parallel paths and actively probing all the paths generates a lot of data. The active tests generate over 1Terabits/second of measurement data per Facebook data center and a Hadoop cluster with hundreds of compute nodes is required per data center to process the data.Processing active test data can detect that packets are being lost within approximately 20 seconds, but doesn't provide the precise location where packets are dropped. A custom multi-path traceroute tool (fbtracert) is used to follow up and narrow down the location of the packet loss.
While described as measuring packet loss, the test system is really measuring path loss. For example, if there are 64 ECMP paths in a pod, then the loss of one path would result in a packet loss of approximately 1 in 64 packets in traffic flows that cross the ECMP group.
Black hole detection describes an alternative approach. Industry standard sFlow instrumentation embedded within most vendor's switch hardware provides visibility into the Continue reading
The former Scalock reaches GA.
Let’s ignore the data flowing through the network for a moment (though the universal scaling law might provide an interesting way to look at packets or flows per second as transactions), and focus just on the control plane. When we look at the control plane, we find a routing protocol or a centralized controller that accepts information about changes in the network topology (and other data points), and builds a model of the network topology which can be used to forward traffic. Questions we can ask about the state being handled by the control plane include things like: How many changes are there? What is the rate at which this information arrives? How many changes might be present in the system at any given time? How many devices participate in the control plane?
The archive for the first of the partner ecosystem series event is live! Take a look at the HPE & Intel webinar. Thank you for joining us in this journey with the HPE partner ecosystem event series. This is only the beginning of this series of webinars & DemoFriday brought to you by the HPE Open NFV &...
New technology is emerging, designed to improve performance in NFV to bring it up to the high standards of service-provider networks.