Justin Betz

Author Archives: Justin Betz

Production-ready automation — the how and why

Last week Cumulus announced the launch of our exciting production-ready solution. This suite of automation scripts provides customers with a quick and validated way to leverage automation for day 1 deployment and day 2 operations. Plus, it’s open source. So it’s completely free to access and use, and it will only expand and improve over time.

Amidst all of the excitement, I wanted to take an opportunity to dive into some of the details of why and how we ended up with such a unique solution. So here we go.

Let’s start with what brought us here

Like most good technology solutions, production-ready automation started with an evaluation of customer challenges.

Challenge #1: First and foremost, we want to produce features and products that help our customers build better networks — networks that are scalable, agile, flexible and efficient. Automation is a huge part of the story and we believe having a feature-rich, Linux-based operating system makes automation even better.

That said, no matter what type of operating system you’re running, most engineers have to piece together scripts and playbooks to build something custom that will hopefully (fingers crossed!) work with their new operating system. This is tedious at Continue reading

Layer 3 can do it better. I’m convinced. You should be too.

There are lots of reasons why we have a tendency to stick to what we know best, but when new solutions present themselves, as the decision makers, we have to make sure we’re still bringing the best solution to our business and our customers. This post will highlight the virtues of building an IP based fabric of point to point routed links arranged in a Clos spine and leaf topology and why it is superior to legacy layer 2 hierarchical designs in the data center.

It’s not only possible, but far easier to build, maintain and operate a pure IP based fabric than you might think. The secret is that by pushing layer 2 broadcast domains as far out to the edges as possible, the data center network can be simpler, more reliable and easier to scale. For context, consider the existing layer 2 hierarchical model illustrated below:


This design depends heavily on MLAG. The peer link is compulsory between two switches providing an MLAG. An individual link failure on the peer link would be more consequential than any of the other links. Ideally, we try to avoid linchpin situations like this. This design does provide redundancy, but depending on Continue reading