Technology has been advancing by leaps and bounds. Currently, scientists and tech experts are designing artificial intelligence (AI)-programmed robots that look and act like humans. While they’re not a standard in our society quite yet, this social robot technology is real and has drastically advanced over the last few decades to the point where these humanistic look-alike robots called androids are springing up all over the place – even in Whitney Cummings’ latest Netflix standup special!
These robots are being developed to someday take over jobs in retail, hospitality, health care, child and elderly care, and even as policemen. But are these android robots as human-looking and acting as scientists would have you believe? Let’s take a closer look and see if they really are as humane as their real-life counterparts.
Superficial Robots that Resemble … Us
Scientists have built AI-driven robots that have two arms, two legs, a complete and realistic face with hair and facial expressions. Some of them can smile or frown, sit or stand, and do a number of human jobs.
From photographs and YouTube videos, many of these robots look like humans wearing a wig. However, a more close-up view leaves them resembling life-sized Continue reading
(This post was written by Tim Hinrichs, Shawn Hargan, and Alex Yip.)
Policy is a topic that we’ve touched on before here at Network Heresy. In fact, policy was the focus of a series of blog posts: first describing the policy problem and why policy is so important, then describing the range of potential solutions, followed by a comparison of policy efforts within OpenStack, and finally culminating in a detailed description of Congress: a project aimed at providing “policy as a service” to OpenStack clouds. (Check out the OpenStack wiki page on Congress for more details on the Congress project itself.)
Like other OpenStack projects, Congress is moving very quickly. Recently, one of the lead developers of Congress summarized some of the performance improvements that have been realized in recent builds of Congress. These performance improvements include things like much faster query performance, massive reductions in data import speeds, and significant reductions in memory overhead.
If you’re interested in the full details on the performance improvements that the Congress team is seeing, go read the full post on scaling the performance of Congress over at ruleyourcloud.com. (You can also subscribe to the RSS feed Continue reading
By Justin Pettit, Ben Pfaff, Chris Wright, and Madhu Venugopal
Today we are excited to announce Open Virtual Network (OVN), a new project that brings virtual networking to the OVS user community. OVN complements the existing capabilities of OVS to add native support for virtual network abstractions, such as virtual L2 and L3 overlays and security groups. Just like OVS, our design goal is to have a production quality implementation that can operate at significant scale.
Why are we doing this? The primary goal in developing Open vSwitch has always been to provide a production-ready low-level networking component for hypervisors that could support a diverse range of network environments. As one example of the success of this approach, Open vSwitch is the most popular choice of virtual switch in OpenStack deployments. To make OVS more effective in these environments, we believe the logical next step is to augment the low-level switching capabilities with a lightweight control plane that provides native support for common virtual networking abstractions.
To achieve these goals, OVN’s design is narrowly focused on providing L2/L3 virtual networking. This distinguishes OVN from general-purpose SDN controllers or platforms.
OVN is a new project from the Open vSwitch team to Continue reading
By Tim Hinrichs and Scott Lowe with contributions from Alex Yip, Dmitri Kalintsev, and Peter Balland
(Note: this post is also cross-published at RuleYourCloud.com, a new site focused on policy.)
In the first few parts of this series, we discussed the policy problem, we outlined dimensions of the solution space, and we gave a brief overview of the existing OpenStack policy efforts. In this post we do a deep dive into one of the (not yet incubated) OpenStack policy efforts: Congress.
Remember that to solve the policy problem, people take ideas in their head about how the data center ought to behave (“policy”) and codify them in a language the computer system can understand. That is, the policy problem is really a programming languages problem. Not surprisingly Congress is, at its core, a policy language plus an implementation of that language.
Congress is a standard cloud service; you install it on a server, give it some inputs, and interact with it via a RESTful API. Congress has two kinds of inputs:
[This post was written by OVS core contributors Justin Pettit, Ben Pfaff, and Ethan Jackson.]
The overhead associated with vSwitches has been a hotly debated topic in the networking community. In this blog post, we show how recent changes to OVS have elevated its performance to be on par with the native Linux bridge. Furthermore, CPU utilization of OVS in realistic scenarios can be up to 8x below that of the Linux bridge. This is the first of a two-part series. In the next post, we take a peek at the design and performance of the forthcoming port to DPDK, which bypasses the kernel entirely to gain impressive performance.
Open vSwitch is the most popular network back-end for OpenStack deployments and widely accepted as the de facto standard OpenFlow implementation. Open vSwitch development initially had a narrow focus — supporting novel features necessary for advanced applications such as network virtualization. However, as we gained experience with production deployments, it became clear these initial goals were not sufficient. For Open vSwitch to be successful, it not only must be highly programmable and general, it must also be blazingly fast. For the past several years, our development efforts have focused on Continue reading
(This post was written by Tim Hinrichs and Scott Lowe, with contributions from Pierre Ettori, Aaron Rosen, and Peter Balland.)
In the first two parts of this blog series we discussed the problem of policy in the data center and the features that differentiate solutions to that problem. In this post, we give a high-level overview of several policy efforts within OpenStack.
Remember that a policy is a description of how (some part of) the data center ought to behave, a service is any component in the data center that has an API, and a policy system is designed to manage some combination of past, present, and future policy violations (auditing, monitoring, and enforcement, respectively).
The overview of OpenStack policy efforts talks about the features we identified in part 2 of this blog series. To recap, those features are:
[This post was authored by T. Sridhar and Jesse Gross.]
Earlier this year, VMware, Microsoft, Red Hat and Intel published an IETF draft on Generic Network Virtualization Encapsulation (Geneve). This draft (first published on Valentine’s Day no less) includes authors from the each of the first generation encapsulation protocols — VXLAN, NVGRE, and STT. However, beyond the obvious appeal of unification across hypervisor platforms, the salient feature of Geneve is that it was designed from the ground up to be flexible. Nobody wants an endless cycle of new encapsulation formats as network virtualization designs and controllers mature, certainly not the vendors that have to support the ever growing list of acronyms and RFCs.
Of course press releases, standards bodies and predictions about the future mean little without actual implementations, which is why it is important to consider the “ecosystem” from the beginning of the process. This includes software and silicon implementations in both commercial and open source varieties. This always takes time but since Geneve was designed to accommodate a wide variety of use cases it has seen a relatively quick uptake. Unsurprisingly, the first implementations that landed were open source software — including switches such as Open Continue reading
[This post was co-authored by Bruce Davie and Ken Duda]
Almost a year ago, we wrote a first post about our efforts to build virtual networks that span both virtual and physical resources. As we’ve moved beyond the first proofs of concept to customer trials for our combined solution, this post serves to provide an update on where we see the interaction between virtual and physical worlds heading.
Our overall approach to connecting physical and virtual resources can be viewed in two main categories:
The latter topic is something we’ve addressed in some other recent posts (here, here and here) — in this blog we’ll focus more on how we deal with physical devices at the edge of the overlay.
We first started working to design a control plane to terminate network virtualization overlays on physical devices in 2012. We started by looking at the information model, defining what information needed to be exchanged between a physical device and a network virtualization controller such as NSX. To bound the problem space, Continue reading
(This post was written by Tim Hinrichs and Scott Lowe with contributions from Martin Casado, Peter Balland, Pierre Ettori, and Dennis Moreau.)
In the first part of this series we described the policy problem: ensuring that the data center obeys the real-world rules and regulations that are pertinent to that data center. In this post, we look at the range of possible solutions by identifying some the key features that are important for any solution to the policy problem. Those key features correspond to the following four questions, which we use to structure our discussion.
Let’s take a look at each of these questions one at a time.
Let’s start by digging deeper into an idea we touched on in the first post when describing the challenge of policy compliance: the sources of policy. While we sometimes talk about there being a single policy for a data center, the reality is Continue reading
As we’ve discussed previously, the vSwitch is a great position to detect elephant, or heavy-hitter flows because it has proximity to the guest OS and can use that position to gather additional context. This context may include the TSO send buffer, or even the guest TCP send buffer. Once an elephant is detected, it can be signaled to the underlay using standard interfaces such as DSCP. The following slide deck provides and overview of a working version of this, showing how such a setup can be used to both dynamically detect elephants and isolate mice from queuing delays they cause. We’ll write about this in more detail in a later post, but for now check out the slides (and in particular the graphs showing the latency of mice with and without detection and handling).
(This post was written by Tim Hinrichs and Scott Lowe with contributions from Martin Casado, Mike Dvorkin, Peter Balland, Pierre Ettori, and Dennis Moreau.)
Fully automated IT provisioning and management is considered by many to be the ultimate nirvana— people log into a self-service portal, ask for resources (compute, networking, storage, and others), and within minutes those resources are up and running. No longer are the people who use resources waiting on the people who are responsible for allocating and maintaining them. And, according to the accepted definitions of cloud computing (for example, the NIST definition in SP800-145), self-service provisioning is a key tenet of cloud computing.
However, fully automated IT management is a double-edged sword. While having people on the critical path for IT management was time-consuming, it provided an opportunity to ensure that those resources were managed sensibly and in a way that was consistent with how the business said they ought to be managed. In other words, having people on the critical path enabled IT resources to be managed according to business policy. We cannot simply remove those people without also adding a way of ensuring that IT resources obey business policy—without introducing a way Continue reading
[This post was written by Dinesh Dutt with help from Martin Casado. Dinesh is Chief Scientist at Cumulus Networks. Before that, he was a Cisco Fellow, working on various data center technologies from ASICs to protocols to RFCs. He’s a primary co-author on the TRILL RFC and the VxLAN draft at the IETF. Sudeep Goswami, Shrijeet Mukherjee, Teemu Koponen, Dmitri Kalintsev, and T. Sridhar provided useful feedback along the way.]
In light of the seismic shifts introduced by server and network virtualization, many questions pertaining to the role of end hosts and the networking subsystem have come to the fore. Of the many questions raised by network virtualization, a prominent one is this: what function does the physical network provide in network virtualization? This post considers this question through the lens of the end-to-end argument.
Networking and Modern Data Center Applications
There are a few primary lessons learnt from the large scale data centers run by companies such as Amazon, Google, Facebook and Microsoft. The first such lesson is that a physical network built on L3 with equal-cost multipathing (ECMP) is a good fit for the modern data center. These networks provide predictable latency, scale well, converge quickly when nodes or links change, and provide Continue reading
[This post has been written by Martin Casado and Justin Pettit with hugely useful input from Bruce Davie, Teemu Koponen, Brad Hedlund, Scott Lowe, and T. Sridhar]
Overview
This post introduces the topic of network optimization via large flow (elephant) detection and handling. We decompose the problem into three parts, (i) why large (elephant) flows are an important consideration, (ii) smart things we can do with them in the network, and (iii) detecting elephant flows and signaling their presence. For (i), we explain the basis of elephant and mice and why this matters for traffic optimization. For (ii) we present a number of approaches for handling the elephant flows in the physical fabric, several of which we’re working on with hardware partners. These include using separate queues for elephants and mice (small flows), using a dedicated network for elephants such as an optical fast path, doing intelligent routing for elephants within the physical network, and turning elephants into mice at the edge. For (iii), we show that elephant detection can be done relatively easily in the vSwitch. In fact, Open vSwitch has supported per-flow tracking for years. We describe how it’s easy to identify elephant flows at the vSwitch and Continue reading
Network virtualization, as others have noted, is now well past the hype stage and in serious production deployments. One factor that has facilitated the adoption of network virtualization is the ease with which it can be incrementally deployed. In a typical data center, the necessary infrastructure is already in place. Servers are interconnected by a physical network that already meets the basic requirements for network virtualization: providing IP connectivity between the physical servers. And the servers are themselves virtualized, providing the ideal insertion point for network virtualization: the vswitch (virtual switch). Because the vswitch is the first hop in the data path for every packet that enters or leaves a VM, it’s the natural place to implement the data plane for network virtualization. This is the approach taken by VMware (and by Nicira before we were part of VMware) to enable network virtualization, and it forms the basis for our current deployments.
In typical data centers, however, not every machine is virtualized. “Bare metal” servers — that is, unvirtualized, or physical machines — are a fact of life in most real data centers. Sometimes they are present because they run software that is not easily virtualized, or because of Continue reading
[This post was written by Martin Casado and Amar Padmanahban, with helpful input from Scott Lowe, Bruce Davie, and T. Sridhar]
This is the first in a multi-part discussion on visibility and debugging in networks that provide network virtualization, and specifically in the case where virtualization is implemented using edge overlays.
In this post, we’re primarily going to cover some background, including current challenges to visibility and debugging in virtual data centers, and how the abstractions provided by virtual networking provide a foundation for addressing them.
The macro point is that much of the difficulty in visibility and troubleshooting in today’s environments is due to the lack of consistent abstractions that both provide an aggregate view of distributed state and hide unnecessary complexity. And that network virtualization not only provides virtual abstractions that can be used to directly address many of the most pressing issues, but also provides a global view that can greatly aid in troubleshooting and debugging the physical network as well.
A Messy State of Affairs
While it’s common to blame server virtualization for complicating network visibility and troubleshooting, this isn’t entirely accurate. It is quite possible to build a static virtual datacenter and, assuming the vSwitch Continue reading
[This post was put together by Teemu Koponen, Andrew Lambeth, Rajiv Ramanathan, and Martin Casado]
Scale has been an active (and often contentious) topic in the discourse around SDN (and by SDN we refer to the traditional definition) long before the term was coined. Criticism of the work that lead to SDN argued that changing the model of the control plane from anything but full distribution would lead to scalability challenges. Later arguments reasoned that SDN results in *more* scalable network designs because there is no longer the need to flood the entire network state in order to create a global view at each switch. Finally, there is the common concern that calls into question the scalability of using traditional SDN (a la OpenFlow) to control physical switches due to forwarding table limits.
However, while there has been a lot of talk, there have been relatively few real-world examples to back up the rhetoric. Most arguments appeal to reason, argue (sometimes convincingly) from first principles, or point to related but ultimately different systems.
The goal of this post is to add to the discourse by presenting some scaling data, taken over a two-year period, from a production network virtualization Continue reading