Author Archives: Doug Youd
Author Archives: Doug Youd
So far in the previous articles, we’ve covered the initial objections to LACP a deep dive on the effect on traffic patterns in an MLAG environment without LACP/Static-LAG. In this article we’ll explore how LACP differs from all other available teaming techniques and then also show how it could’ve solved a problem in this particular deployment.
I originally set out to write this as a single article, but to explain the nuances it quickly spiraled beyond that. So I decided to split it up into a few parts.
• Part1: Design choices – Which NIC teaming mode to select
• Part2: How MLAG interacts with the host
• Part3: “Ships in the night” – Sharing state between host and upstream network
An important element to consider is LACP is the only uplink protocol supported by VMware that directly exchanges any network state information between the host and its upstream switches. An ESXi host is also sortof a host, but also sortof a network switch (in so far as it does forward packets locally and makes path decisions for north/south traffic); here in lies the problem, we effectively have network devices forwarding packets between each other, but Continue reading
In part1, we discussed some of the design decisions around uplink modes for VMware and a customer scenario I was working through recently. In this post, we’ll explore multi-chassis link aggregation (MLAG) in some detail and how active-active network fabrics challenge some of the assumptions made.
Disclaimer: What I’m going to describe is based on network switches running Cumulus Linux and specifically some down-in-the-weeds details on this particular MLAG implementation. That said, most of the concepts apply to similar network technologies (VPC, other MLAG implementations, stacking, virtual-chassis, etc.) as they operate in very similar ways. But YMMV.
I originally set out to write this as a single article, but to explain the nuances it quickly spiraled beyond that. So I decided to split it up into a few parts.
• Part1: Design choices – Which NIC teaming mode to select
• Part2: How MLAG interacts with the host (This page)
Part3: “Ships in the night” – Sharing state between host and upstream network
If the host is connected to two redundant switches (which these days is all but assumed), then MLAG (and equivalent solutions) is a commonly deployed option. In simple terms, Continue reading
Recently I’ve been helping a customer who’s working on a VMware cloud design. As is often the case, there are a set of consulting SME’s helping with the various areas; an NSX/virtualization consultant, the client’s tech team and a network guy (lucky me).
One of the interesting challenges in such a case is understanding the background behind design decisions that the other teams have made and the flow-on effects they have on other components. In my case, I have a decent background in designing a VMware cloud and networking, so I was able to help bridge the gap a little.
My pet peeve in a lot of cases is the common answer of “because it’s ‘best-practice’ from vendor X” and a blank stare when asked: “sure, but why?”. In this particular case, I was lucky enough to have a pretty savvy customer, so a healthy debate ensued. This is that story.
Disclaimer: What I’m going to describe is based on network switches running Cumulus Linux and specifically some down-in-the-weeds details on this particular MLAG implementation. That said, most of the concepts apply to similar network technologies (VPC, other MLAG implementations, stacking, virtual-chassis, etc.) as they operate in very Continue reading
Are we feeding an L2 addiction?
One of the fundamental challenges in any network is placement and management of the boundary between switched (L2) and routed (L3) fabrics. Very large L2 environments tend to be brittle, difficult to troubleshoot and difficult to scale. With the availability of modern commodity switching ASICs that can switch or route at similar speeds/latency, smaller L3 domains become easier to justify.
There is a recent strong trend towards reducing the scale of L2 in the data center and instead using routed fabrics, especially in very large scale environments.
However, L2 environments are typically well understood by network/server operations staff and application developers, which has slowed adoption of pure L3-based fabrics. L3 designs also have some other usability challenges that need to be mitigated.
This is why the L2 over L3 (AKA “overlay” SDN) techniques are drawing interest; they allow admins to keep provisioning how they’re used to. But maybe we’re just feeding an addiction?
Mark Burgess recently wrote a blog post exploring in depth how we got here and offering some longer term strategic visions. It’s a great read, I highly encourage taking a look.
But taking a step back, let’s explore Continue reading
Now that I’ve kept marketing/SEO happy, lets dive right in. Two of the latest initiatives from the VMware marketing engine are ‘“SDDC”’ (software-defined data center) and “Hyper-Converged.” Hype aside, these two concepts are fundamentally aligned with what hyper-scale operators have been doing for years. It boils right down to having generic hardware configured for complex and varied roles by layering different software.
At Cumulus Networks, we help businesses build cost-effective networks by leveraging “white box” switches together with a Linux® operating system. We feel this is the crucial missing piece to the overall SDDC vision.
First, let’s back up a little and talk a little history. VMware started as a hypervisor stack for abstracting compute resources from the underlying server (and before that, workstations… but I digress). They moved on to more advanced management of that newly disaggregated platform, with the rise of vCenter and everything that followed.
Then VMware turned their attention to the Continue reading