Listen, you can’t name an open networking podcast “Kernel of Truth,” and NOT have an episode dedicated to the Linux kernel! So we got two of the brightest, most enthusiastic Linux experts we know into the recording booth and let them wax poetic about the language of the data center. As I soon found out, it’s harder to get Linux fans to STOP talking about Linux that it is to get them going — but hey, that just makes my job as host a lot easier! There’s nothing like listening to knowledgeable people discuss something they’re passionate about, and that’s what we’ve got in store for you.
In this episode, I’m joined by Roopa Prabhu, leader of the kernel team at Cumulus Networks, and Shrijeet Mukherjee, Cumulus’ former VP of Engineering. Specifically, our discussion revolves around the Linux kernel and Linux community. We get into some pretty interesting questions: why Linux in the data center? What has Cumulus contributed to the kernel? How has the prolific Linux community evolved? What the heck is a “boffin”?? I’m not a fan of spoilers, (thanks for ruining Avengers: Infinity War for me, Twitter!) so I’ll let you guys tune in and find Continue reading
Cumulus NetQ is on FIRE!!
Just one year ago, Cumulus launched a new product that fundamentally changes the way organizations validate and troubleshoot not just their network, but the entire Linux ecosystem as a whole. The product was named NetQ (think Network Query). It provides deep insight on the connectivity of all network devices either now or in the past — including all switches, Linux hosts, inside Linux hosts (Containers, direct interaction with container orchestration tools like Kubernetes, VMs, Openstack environment) and any other devices running a Linux-based operating system connected to the network. No more manual box-by-box troubleshooting, no more wondering what happened last night, no more pulling cables to find where the issue was stemming from, no more finger pointing, no more human-led misconfigurations and no more frustration of not having sight past the edge of the network.
Instead, Cumulus NetQ, the agent-based technology that runs on anything Linux, changes all that. NetQ brings the efficiencies of web-scale to network operations with an algorithmic, preventive, centralized telemetry system built for the modern automated cloud network. NetQ aggregates and maintains data from across all Linux nodes in the data center in a time-series database, making the fabric-wide events, Continue reading
When you think of your ideal campus network, the term “web-scale” may not immediately come to mind. After all, the term web-scale is something you’re more likely to associate with the cloud than with your network. But you might be surprised to learn that your ideal campus network fits the definition of a web-scale network to a T.
Fundamentally, a web-scale network functions as a single unit that can grow and change on demand, without requiring hands-on reconfiguration of multiple switches or mass hardware replacement. And because it functions as a single unit, a web-scale network can also give you full visibility into the health of your network, end-to-end.
The primary way web-scale networks achieve this flexibility and visibility is by decoupling or disaggregating the hardware and the network operating system (NOS) that runs on the hardware. Since the advent of specialized hardware networking devices, the operating system and hardware have been tightly coupled together. Proprietary NOSes often have platform-independent code that runs only on specialized hardware. Because of that, upgrading to a new software version often means buying new hardware. In some cases, that may be as simple as buying additional RAM to support the new version. In more Continue reading
“To boldly go where no one has gone before!”
Those words still echo in my mind as I remember watching the old Star Trek shows from yesteryear. It rings of adventure, of exploration, and of never settling for the known state of things.
It is these words that come to mind when I think of the new Voyager technology that is coming to market, which is designed to go boldly where no other technology in the Packet and Optical world has gone before. Voyager is the industry’s first combined routing, switching and optical platform all combined in a 1 RU footprint. The unique combo sets out to unifying both IP and optical to massively reduce complexity and costs. It will boldly transform the data center interconnect of today.
But it’s also the first open offering in the optical space. Cumulus is bringing its networking with S.O.U.L. (Simple. Open. Untethered. Linux) moxie and applying it to the transport and data center interconnect markets. This disaggregated solution dramatically reduces the cost of the current proprietary stack. It’s a solution with multiple players…
A few weeks ago, I set out to the beautiful city of Vancouver’s convention center, along with a boatload of rocket turtles and a stellar team. It was a great time with a wonderful scenic view of the ocean. I’ve been at Cumulus a few months now, but I can’t help but enjoy looking around, seeing the friends I’ve made in the industry, and the friends and companies Cumulus has worked with over the years. It is exciting to have thousands of people coming together at OpenStack Summit Vancouver to work on a shared goal.
This year, we were lucky enough to have our very own Pete Lumbis take the stage with David Iles of Mellanox to present our joint solution around the latest SDN revolution, which is centered on creating efficient virtualized data center networks using VXLAN & EVPN.
In the next few paragraphs, I’ll share some highlights of the event, some photos, and a recap of that exact discussion. There was a lot to learn and discover, and I’m excited to share the details.
On our first day, lots of things were going on — we Continue reading
Network monitoring, “Wonderwall” by Oasis, virtual test environments, Wu-Tang Clan (Cumulus Rules Everything Around Me!), validating configurations and cursed email chains. What do all of these things have in common? They’re all topics in Kernel of Truth’s second episode! Now, if you want to know HOW all of these seemingly random talking points fit together, you’ll have to listen for yourself, but the main focus of this discussion is Day 2 operations. Specifically, we get into important topics like:
Our guest panel consists of two networking ops experts from Cumulus Networks: Senior Consulting Engineer Rama Darbha (also known as “Tough Tiger Fist” according to the Wu-Tang name generator), who you’ll remember from our previous episode on network automation, and Technical Marketing Engineer Pete Lumbis (aka “Master Block Warrior”). These industry pros joined me (“Ungrateful Ambassador”) to provide first-hand experience and insight into why Day 2 operations deserve just as much attention as architectural design.
On another note, we’ve got some great news — Continue reading
No doubt about it: the prospect of adding another zero to the end of your top network speeds is exciting. And the reward of the immediately noticeable performance improvement never gets old. Speed makes a noticeable, and not just measurable, difference. And with the massive increase in the amount of data servers need to process, 100G is soon going to be a necessity for many organizations.
But increasing network speed is about more than pushing more bits across a wire. Faster networks enable you to squeeze more out of your physical rack space. You need fewer servers, fewer network connections, and – dare I say it – fewer switches. It’s true. A faster network lets you pack more computing into the same space.
Whether you plan to do a forklift upgrade to 100G or intend to replace one switch at a time, there are some key things you need to know to avoid getting locked into one switch vendor or losing backward compatibility with your existing equipment. In this post, I’m going to give you my top 5 tips for making transitioning to 100G networking a smooth one.
First, a little background. Continue reading
Virtual Routing and Forwarding (VRF) is a ubiquitous concept in networking, first introduced in the late 1990s as the control and data plane mechanism to provide traffic isolation at layer 3 over a shared network infrastructure. VRF for Linux is an excellent blog that describes the technology behind VRFs, especially as it pertains to the Linux kernel. With the introduction of support for leaking of routes, VRFs get to enjoy their isolation while also having the nous to mix and mingle.
You have a valid question there. That was certainly the initial use case for VRFs. Each VRF was intended to represent a customer of a service provider and isolation was a fundamental tenet. Each VRF had its own routing protocol sessions and IPv4 and IPv6 routing tables and route computation as well as packet forwarding was independent from other VRFs. All communication stayed within the VRF other than specific scenarios such as reaching the Internet. Hershey’s wouldn’t want to get too chatty with Lindt, right? No, VRFs weren’t meant to be gregarious.
As VRFs moved outside the realm of the service provider and started finding application elsewhere, such as in the Continue reading
It’s officially summer time, so we’re bringing you the HOTTEST new content from Cumulus Networks in this month’s content roundup! Whether you want to layer on the sunscreen and enjoy our content while basking in the sun, or stay safe and cool indoors with your laptop and AC, you’re bound to enjoy what we’ve got in store for you. We’ve got new videos and white papers, and even a brand new official Cumulus Networks podcast for you to check out!
Kernel of Truth Episode 01 – Networking Automation: “Kernel of Truth” is a Cumulus Networks podcast dedicated to bringing the best of open networking thought leadership straight to your ears. Listen to our very first episode where we discuss network automation and its impact on the industry!
5 Network automation tips and tricks for NetOps: In this white paper, we’ll give you five tips and tricks to get clarity around your automation decisions and reduce any friction that may be inhibiting (further) adoption of network automation. Check it out!
Joint solution overview: OpenStack and Cumulus Networks: By combining with Cumulus Linux, you can unify the entire stack on Linux, bringing together the OpenStack servers Continue reading
We often receive the following campus design question: “do you support switch stacking?” This is a fair question, as many of the legacy vendors have promoted stacking designs for the past decade. It’s popular enough that people ask for it, so we must support it, right?
Well, the popular option isn’t always the best one, and switch stacking designs are a very good example of that philosophy. So when people ask if we support stacking, we think to ourselves “heck, no” before politely telling them that we do not because better options exist.
“Perfection is attained, not when there is nothing more to add, but when there is nothing more to take away.”
At Cumulus Networks, we believe that simplicity is the corner-stone of network design.
Or, to say it another way, complex designs fail in complex ways (shoutout to Eric Pulvino for that quote!). Our former Chief Scientist, Dinesh Dutt, gave an excellent explanation around the importance of simple building blocks in his Tech Field Day 9 Presentation (6min 50 seconds in).
Let’s address a little history on switch stacking and then break down the major technical downfalls of a stacking design, the stacking protocol itself, Continue reading
There’s no doubt that we’re in the gilded age of podcasts. Anyone you ask has at least one or two podcasts that they love to listen to and won’t stop talking about! But you’re probably tired of hearing that one friend repeat every single episode of “This American Life.” You need a new, engaging show that focuses on thought leadership and the topics in data center networking that you care about. Something to liven up your commute and make a long work day fly by. Something to help you stay up to date and relevant in your networking knowledge, hear about real examples of innovation in the data center and turn you into a super star employee that shoots up the career ladder. If that’s what you’ve been looking for, your search is finally over.
Seven years ago, Cumulus Networks took the world by storm and rocked the data center. Today, we’re giving the podcasting world a healthy dose of Cumulus goodness — introducing the official Cumulus Networks podcast, “Kernel of Truth!”
Why are we starting a podcast? In addition to wanting to share our thoughts and insight about all things networking, including automation, disaggregation, data center interconnect Continue reading
When it comes to troubleshooting, everyone talks about the power of the command tcpdump — after all, “the wire never lies.” But to really use it, you need to put in some time to understand the options. Let us save you some time and give you a quick overview of this powerful tool. You’ll be troubleshooting like a pro in no time!
For those unfamiliar with this powerful command, tcpdump is a packet analyzer that prints out a description of packets being transmitted or received over a network. Each line of output represents a packet. Every line includes a time stamp printed as hours, minutes, seconds, and fractions of a second since midnight. It will also show you packets dropped, packets received by the filter (which can vary depending on your OS) and packets dropped by kernel. Essentially, tcpdump does exactly what its name implies — it “dumps” all the information you need about the content of packets in the CLI so you can analyze it for yourself.
So, why is this so important for troubleshooting? Think of it this way. When box isn’t acting right, seeing what you are getting Continue reading
Networks are changing. More and more we’re hearing terms like whitebox, britebox, disaggregation, NOS, commodity hardware and open source when we talk about the future of networking. Since you’re reading this on the Cumulus Networks blog, I’ll assume you get that and spare you a description of these terms here. If you do want a crash course on network disaggregation and how it relates to orchestration/SDN, check out my previous post on the Packet Pushers blog.
With that bit of housekeeping out of the way, let’s dig right into today’s topic: open source software security.
First, why does security matter? If you’re like most network engineers, your primary goal typically is to get bits of data from one place to another. Anything that interferes with the free flow of packets and frames is a potential problem. So the goals of security can at first appear contrary to those of the network. Raise your hand if you’ve ever been frustrated by a firewall rule or some seemingly arcane security policy!
Unfortunately, we no longer have the luxury of ignoring security. Today’s network is one of the most crucial pieces of IT infrastructure for any organization and for the economies we operate in. Continue reading
One critical decision that executives need to make when assessing their data center architecture is their approach to software vulnerability management across all network components. Vulnerability management primarily revolves around selecting an efficient and modern software management strategy. There are several ways to execute on a software management strategy, and I believe disaggregation is a critical first step in doing it right.
In this post, I want to take a minute to first share my thoughts on the vulnerability management trends I’ve noticed. I will argue that a) you need to prioritize the network in how you manage vulnerabilities and b) disaggregation is the only way to do it properly. We’ll also take a look at the reasons why I think we never had the right framework to manage software delivery, making vulnerability management a challenge on platforms that are closed in nature.
Three weeks ago, I joined 40,000 security professionals in San Francisco to attend the biggest gathering of security conscious professionals — RSA Conference. While there were several presentations and moments from the event that stood out, one that caught my eye was a presentation that discussed challenges in the industry Continue reading
Automating your network can seem like a daunting task. But the truth is that automating Cumulus Linux with Ansible can be easier than many of the things you’re probably already automating.
In this post, I’ll show you how to get started on your network automation journey using a simple, four-step process:
To illustrate, I’ll be using the following simple, bare-bones topology based on the Cumulus Reference topology. You can follow along by spinning up your own virtual data center for free using Cumulus in the Cloud.
The first step is to pick one thing to automate. Just one! The only caveat is that it needs to be something you understand and are comfortable with. Trying to automate a feature you’ve never used is sure to scare you away from automation forever, unless of course you have someone guiding you through the process.
Preferably, pick something that’s quick and simple when done manually. Configuring the OSPF routing protocol between two switches falls into this category. When done manually, Continue reading
Despite what some people say, automation is not for the lazy. This opinion probably comes from the fact that the whole point of automation is to reduce repetitive tasks and make your life easier. Indeed automation can do just that, as well as giving you back hours each week for other tasks.
But getting your automation off the ground to begin with can be a challenge. It’s not as if you just decide, “Hey, we’re going to automate our network now!” and then you follow a foolproof, well-defined process to implement network automation across the board. You have to make many decisions that require long discussions, and necessitate ambitious and careful thinking about how you’re going to automate.
Just as with anything else in the IT world, there are no one-size-fits-all solutions, and no “best practices” that apply to every situation. But there are some common principles and crucial decision points that do apply to all automation endeavors.
In this post, I’ll give you five network automation tips and tricks to get clarity around your automation decisions and reduce any friction that may be inhibiting (further) adoption of network automation.
Automating Continue reading
Hope you brought your networking acronyms dictionary with you – this month’s Cumulus content roundup is going full tech-geek and we’re NOT ashamed! We’re brushing up on EVPN, ECMP, DWDM and TGIF (okay, not the last one. But did that make you LOL?) See a term that makes you go WTF? Don’t worry — we’ve got webinars, videos, blog posts and more to help you differentiate between BGP and OMG.
EVPN content hub: Deploying EVPN enables you to enhance your layer 3 data center with benefits such as multitenancy, scalability, ARP suppression and more. Don’t know where to begin? Browse this EVPN resources page to learn more about how you can incorporate EVPN into your Cumulus network.
Celebrating ECMP in Linux — part one: Equal Cost Multi-Path (ECMP) routes are a big component of all the super-trendy data center network designs that are en vogue right now. Read part one of this series about ECMP’s history, how it’s evolved and what Cumulus is doing to help.
Networking how-to video — What is Voyager?: Voyager is a Dense Wavelength Division Multiplexing (DWDM) platform Facebook brought to the Telecom Infra Project (TIP), bringing the first Continue reading
Many moons ago, Cumulus Networks set out to further the cause of open networking. The premise was simple: make networking operate like servers. To do that, we needed to develop an operating system platform, create a vibrant marketplace of compatible and compliant hardware and get a minimum set of features implemented in a robust way.
Today, these types of problems are largely behind us, and the problem set has moved in the right direction towards innovation and providing elegant solutions to the problems around scale, mobility and agility. Simply put, if “Linux is in the entire rack,” then it follows that the applications and services deployed via these racks should be able to move to any rack and be deployed for maximum overall efficiency.
The formula for this ephemeral agility then is based on two constructs.
One of the consistent questions that arises during the web-scale transition is the impact of managed access to networking infrastructure. How do we take traditional management techniques and adapt them to the new operational paradigm of web-scale networking, where automation drives the majority of changes and the infrastructure is treated as a holistic entity rather than node-by-node?
In the most basic way, we can migrate existing workflows to the new paradigm. Though inefficient, the old way of doing things still works with the new web-scale paradigm. The easiest way to do this is to restrict access to your switches using local privileges. In Linux, users are controlled using the adduser command, and the permissions for that user are controlled using the chmod commands.
A list of all users is stored in the /etc/passwd folder of Linux:
cumulus@leaf02:~$ cat /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/usr/sbin/nologin man:x:6:12:man:/var/cache/man:/usr/sbin/nologin lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin mail:x:8:8:mail:/var/mail:/usr/sbin/nologin news:x:9:9:news:/var/spool/news:/usr/sbin/nologin uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin proxy:x:13:13:proxy:/bin:/usr/sbin/nologin www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin backup:x:34:34:backup:/var/backups:/usr/sbin/nologin list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin systemd-timesync:x:100:103:systemd Time Synchronization,,,:/run/systemd:/bin/false systemd-network:x:101:104:systemd Network Management,,,:/run/systemd/netif:/bin/false systemd-resolve:x:102:105:systemd Resolver,,,:/run/systemd/resolve:/bin/false systemd-bus-proxy:x:103:106:systemd Bus Proxy,,,:/run/systemd:/bin/false frr:x:104:109:Frr routing suite,,,:/var/run/frr/:/bin/false ntp:x:105:110::/home/ntp:/bin/false uuidd:x:106:111::/run/uuidd:/bin/false messagebus:x:107:112::/var/run/dbus:/bin/false sshd:x:108:65534::/var/run/sshd:/usr/sbin/nologin snmp:x:109:114::/var/lib/snmp:/usr/sbin/nologin dnsmasq:x:110:65534:dnsmasq,,,:/var/lib/misc:/bin/false _lldpd:x:111:115::/var/run/lldpd:/bin/false cumulus:x:1000:1000:cumulus,,,:/home/cumulus:/bin/bash
Users can be added and deleted using the adduser and deluser commands:
cumulus@leaf02:~$ sudo Continue reading
Businesses today have to get applications to market faster than ever, but with the same or less budget. Because of this requirement, modern data centers are evolving to support a change in application delivery. In order to get applications to market faster and increase revenue, applications that were once built as one monolithic entity are becoming segmented and deployed separately, communicating amongst themselves. The pieces of applications, sometimes referred to as microservices, are often deployed as containers. This results in much faster deployment and a quicker update cycle. However, the network teams operating the infrastructure supporting the applications often have no visibility into how their networks are being utilized, and thus are making design, operations and troubleshooting decisions blindly. Now, Cumulus NetQ provides this visibility from container deployments all the way to the spine switches and beyond — accelerating operations and providing the crucial information to efficiently design and operate the networks running containers.
Traditionally, the new application design and deployment method using containers makes operating and managing the infrastructure to support them very challenging. The containers often have to talk with each other within or across data centers or to the outside world. An Continue reading