Combining, or stitching together, open source projects to build something unique for your network is becoming more common. What does this look like in the real world? What are some of the positive and negative aspects of building things this way? How do open source projects interact with the commercial world? Daniel Teycheney joins Tom Ammon, Jett Tantsura, and Russ White to discuss open source software in networking, particularly around network monitoring and management.
BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.
My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.
To move BGP closer to autonomic operation in a DC fabric, there are several things we can do. First, we can allow a BGP speaker to peer with any other BGP speaker it receives an open message from—this is often called promiscuous mode. While each router in the fabric will still need to be configured with the right autonomous system, at least we won’t need to configure the correct peers on each router (including the Continue reading
The open source world is not much different than the commercial world in terms of building marketectures rather than useable software—largely because open source projects still rely on sources of funding and material support to build and maintain a product. Many times, however, the focus on these marketectures get in the way of real work. Join Tom Ammon, Russ White, and Lisa Caywood as we discuss the problem of marketectures and the broader world of open source software.
One of the most important features of the Network Operating Systems, like Banyan Vines and Novell Netware, available in the middle of the 1980’s was their integrated directory system. These directory systems allowed for the automatic discovery of many different kinds of devices attached to a network, such as printers, servers, and computers. Printers, of course, were the important item in this list, because printers have always been the bane of the network administrator’s existence. An example of one such system, an early version of Active Directory, is shown in the illustration below.
Users, devices and resources, such as file mounts, were stored in a tree. The root of the tree was (generally) the organization. There were Organizational Units (OUs) under this root. Users and devices could belong to an OU, and be given access to devices and services in other OUs through a fairly simple drag and drop, or GUI based checkbox style interface. These systems were highly developed, making it fairly easy to find any sort of resource, including email addresses of other uses in the organization, services such as shared filers, and—yes—even printers.
The original system of this kind was Banyan’s Streetalk, which did not have the Continue reading
Before I continue, I want to remind you what the purpose of this little series of posts is. The point is not to convince you to never use BGP in the DC underlay ever again. There’s a lot of BGP deployed out there, and there are lot of tools that assume BGP in the underlay. I doubt any of that is going to change. The point is to make you stop and think!
Why are we deploying BGP in this way? Is this the right long-term solution? Should we, as a community, be rethinking our desire to use BGP for everything? Are we just “following the crowd” because … well … we think it’s what the “cool kids” are doing, or because “following the crowd” is what we always seem to do?
In my last post, I argued that BGP converges much more slowly than the other options available for the DC fabric underlay control plane. The pushback I received was two-fold. First, the overlay converges fast enough; the underlay convergence time does not really factor into overall convergence time. Second, there are ways to fix things.
If the first pushback is always true—the speed of the underlay control plane Continue reading
Someone recently asked me to suggest a list of books on thinking skills; I figured others might be interested in the list, as well, so … I decided to post it here. Further, I’ve added a few books to my “recommended book list” here on rule11; I thought I’d point those out, as well. My first suggestion, of course, is that if you want to improve your thinking skills, read. I don’t just mean technical stuff, I mean all over the place, in the form of books, and a lot.
So, forthwith, some more things to read.
Thinking Books
Recently Added Books
You can find my list of recommended books here, and my goodreads profile, which lists a lot of the books I’ve read, I’m currently reading, and plan to read, here.
When we think of automation—and more broadly tooling—we tend to think of automating the configuration, monitoring, and (possibly) the monitoring of a network. On the other hand, a friend once observed that when interviewing coders, the first thing he asked was about the tools they had developed and used for making themselves more efficient. This “self-tooling” process turns out to be important not just to be more efficient at work, but to use time more effectively in general. Join Nick Russo, Eyvonne Sharp, Tom Ammon, and Russ White as we discuss self-tooling.
FARNT was a regional consortium of smaller network operators that eventually helped drive the adoption of TCP/IP and the global Internet, as well as helping efforts to commercialize Internet access. Join Donald Sharp and Russ White as Laura Breeden discusses the origins of FARNT, it’s importance in the adoption of early Internet technologies, and the many hurdles regional network operators had to overcome.
Laura is now the Board Chair at the National Digital Inclusion Alliance.
I was recently a guest on The Art of Conviction podcast, where we covered a bit of my background, some of the challenges I’ve faced in getting where I am, and then we moved into a discussion around my recently finished dissertation. I’m working to find places to publish more in the area of worldview and culture; I’ll point to those here as I can find a “home” for that side of my life.
You can find the recording here.
Beyond my episode, The Art of Conviction is a fascinating podcast; you should really subscribe and listen in.
The fist post on this topic considered some basic definitions and the reasons why I am writing this series of posts. The second considered the convergence speed of BGP on a dense topology such as a DC fabric, and what mechanisms we normally use to improve BGP’s convergence speed. This post considers some of the objections to slow convergence speed—convergence speed is not important, and ECMP with high fanouts will take care of any convergence speed issues. The network below will be used for this discussion.
Two servers are connected to this five-stage butterfly: S1 and S2 Assume, for a moment, that some service is running on both S1 and S2. This service is configured in active-active mode, with all data synchronized between the servers. If some fabric device, such as C7, fails, traffic destined to either S1 or S2 across that device will be very quickly (within tens of milliseconds) rerouted through some other device, probably C6, to reach the same destination. This will happen no matter what routing protocol is being used in the underlay control plane—so why does BGP’s convergence speed matter? Further, if these services are running in the overlay, or they are designed to discover Continue reading
FR Routing is a widely used and supported open source routing stack. In this episode of the Hedge, Alistair Woodman, Quentin Young, Donald Sharp, Tom Ammon, and Russ White discuss recent updates, additions to the CI/CD system, the release process, and operating system support. If you’re looking for a good open source, containerized routing stack for everything from route servers to DC fabrics and labbing to production, you should check out FR Routing.
I’m teaching another master class over at Juniper on February the 10th at 12 noon PT (3PM ET):
It’s typical to think about scale, speed, oversubscription, and costs when designing a data center fabric. But what about security in a world increasingly focused on privacy, data protection, and preventing downtime caused by cyber breaches? This session will consider how data center fabric software and control plane components can impact security, including the ability to effectively manage segmentation policy, controlling failure domains, and the impact host-based routing has on fabric security.
Early on in my career as a network engineer, I learned the value of sharing. When I could not figure out why a particular application was not working correctly, it was always useful to blame the application. Conversely, the application owner was often quite willing to share their problems with me, as well, by blaming the network.
A more cynical way of putting this kind of sharing is the way RFC 1925, rule 6 puts is: “It is easier to move a problem around than it is to solve it.”
Of course, the general principle applies far beyond sharing problems with your co-workers. There are many applications in network and protocol design, as well. Perhaps the most widespread case deployed in networks today is the movement to “let the controller solve the problem.” Distributed routing protocols are hard? That’s okay, just implement routing entirely on a controller. Understanding how to deploy individual technologies to solve real-world problems is hard? Simple—move the problem to the controller. All that’s needed is to tell the controller what we intend to do, and the controller can figure the rest out. If you have problems solving any problem, just call it Software Defined Continue reading
In my last post on this topic, I laid out the purpose of this series—to start a discussion about whether BGP is the ideal underlay control plane for a DC fabric—and gave some definitions. Here, I’d like to dive into the reasons to not use BGP as a DC fabric underlay control plane—and the first of these reasons is BGP converges very slowly and requires a lot of help to converge at all.
Examples abound. I’ve seen the results of two testbeds in the last several years where a DC fabric was configured with each router (switch, if you prefer) in a separate AS, and some number of routes pushed into the network. In both cases—one large-scale, the other a more moderately scaled network on physical hardware—BGP simply failed to converge. Why? A quick look at how BGP converges might help explain these results.
Assume we are watching the 110::/64 route (attached to A, on the left side of the diagram), at P. What happens when A loses it’s connection to 110::/64? Assuming every router in this diagram is in a different AS, and the AS path length is the only factor determining the best path at every router.
Watching Continue reading
Everyone who’s heard me talk about container networking knows I think it’s a bit of a disaster. This is what you get, though, when someone says “that’s really complex, I can discard the years of experience others have in designing this sort of thing and build something a lot simpler…” The result is usually something that’s more complex. Alex Pollitt joins Tom Ammon and I to discuss container networking, and new options that do container networking right.