Pete Lumbis

Author Archives: Pete Lumbis

Choosing an EVPN Underlay Routing Protocol

EVPN is all the rage these days. The ability to do L2 extension and L3 isolation over a single IP fabric is a cornerstone to building the next-generation of private clouds. BGP extensions spelled out in RFC 7432 and the addition of VxLAN in IETF draft-ietf-bess-evpn-overlay established VxLAN as the datacenter overlay encapsulation and BGP as the control plane from VxLAN endpoint (VTEP) to VxLAN endpoint. Although RFC 7938 tells us how to use BGP in the data center, it doesn’t discuss how it would behave with BGP as an overlay as well. As a result, every vendor seems to have their own ideas about how we should build the “underlay” network to get from VTEP to VTEP, allowing BGP-EVPN to run over the top.

An example of a single leaf’s BGP peering for EVPN connectivity from VTEP to VTEP

Let’s take a look at our options in routing protocols we could use as an underlay and understand their strengths and weaknesses that make them a good or bad fit for deployment in an EVPN network. We’ll go through IS-IS, OSPF, iBGP and eBGP. I won’t discuss EIGRP. Although it’s now an IETF standard, it’s still not widely supported Continue reading

Our commitment to open networking

On Monday we released our latest version of Cumulus Linux, 3.5. It includes symmetric VxLAN routing, Voice VLAN and 10 new hardware platforms. This includes General Availability (GA) of our two supported chassis, the four slot Backpack and eight slot OMP800. We announced Early Access (EA) support for both chassis in our previous release, Cumulus Linux 3.4.

At Cumulus, moving fast to fix problems and get features in the hands of our customers is core to our culture. In today’s webscale networks, it’s hard for even the largest of organizations to operate on classic 18+ month buying cycles. Some folks want the ability to use new technology as soon as possible.

The EA process gives customers the ability to use working software or hardware and provide direct feedback on the final product. That feedback improves all aspects of the product, from purchasing, delivery, default configurations or operations.

When we announced EA for our chassis systems, we had many Fortune 500 customers express interest. For some, the EA process allowed them to start the purchasing process knowing that it would take months until a final purchase order was ready. For others, they were able to put working, stable Continue reading

NetDevOpEd: The power of network verification

Microsoft just published information on their internal tool called “CrystalNet” which Microsoft defines as “a high-fidelity, cloud-scale network emulator in daily use at Microsoft. We built CrystalNet to help our engineers in their quest to improve the overall reliability of our networking infrastructure.” You can read more about their tool in this detailed ACM Paper. But what I want to talk about is how this amazing technology is accessible to you, at any organization, right now, with network verification using Cumulus VX.

What Microsoft has accomplished is truly amazing. They can simulate their network environment and prevent nearly 70% of the network issues they experienced in a two-year period. They have the ability to spin up hundreds of nodes with the exact same configurations and protocols they run in production. Then applying network tests, they verify if proposed changes will have negative impact on applications and services. This work took the team of Microsoft researchers over two years to develop. It’s really quite the feat!

What I find exciting about this is it validates exactly what we at Cumulus have been preaching for the last two years as well. The ability to make a 1:1 mirror of Continue reading

Choosing your chassis: a look at different models

Simplicity, scalability, efficiency, flexibility — who doesn’t want to be able to use those words when talking about their data center? As more and more companies adopt web-scale networking and watch their growth rapidly increase, the need for an equally scalable and powerful solution becomes apparent. Fortunately, Cumulus Networks has a solution. We believe in listening to what our customers want and providing them with what they need; that’s why we supports the Facebook Backpack for 64 to 128 ports of 100gig connectivity and the Edge-Core OMP800 for 256 ports of 100gig connectivity. So, what exactly is so great about these chassis? Let’s take a closer, more technical look.

The topology

When designing and building out new data centers, customers have universally agreed on spine and leaf networks as the way to go. Easy scale out by adding more leafs when server racks are added and more manageable oversubscription by adding more spines makes this design an obvious choice. We at Cumulus have built some of the largest data centers in the world out of one-rack-unit switches: 48 port leafs and 32 port spines. These web-scale data centers build “three-tier” spine and leaf networks using spine and leaf “pods” Continue reading

Independence from L2 Data Centers

We’ve all been there. That “non-disruptive” maintenance window that should “only be a blip”. You sit down at the terminal at 10pm expecting that adding a new server to the MLAG domain or upgrading a single switch will be a simple process, only to lose the rack of dual-attached servers and spend the rest of your Thursday night frantically trying to bring the cluster back online.

If I never spend another evening troubleshooting an outage caused by MLAG, I’ll die happy!

While MLAG provides higher availability than single attaching or a creating multi-port bond to a single switch, it comes with the cost of a delicate balancing act. What if there was a way to provide redundancy without MLAG’s fragility and its risk to maintenance windows?

We at Cumulus Networks have seen many of our customers solve these problems by leveraging Cumulus Quagga, our enhanced version of the routing suite, on their server hosts, so we’ve decided to call it Routing on the Host and make it broadly available for download.

By leveraging the routing protocols OSPF or BGP all the way to the server, we can resolve that MLAG problem once and for all.

Figure-1--MLAG-Topology-Vs-All-Routed-Topology

Over the last five years, Continue reading