One of the designs I’ve been encountering a lot of recently is a “collapsed spine” data center network, as shown in the illustration below.
In this design, and B are spine routers, while C-F are top of rack switches. The terminology is important here, because C-F are just switches—they don’t route packets. When G sends a packet to H, the packet is switched by C to A, which then routes the packet towards F, which then switches the packet towards H. C and F do not perform an IP lookup, just a MAC address lookup. A and B are responsible for setting the correct next hop MAC address to forward packets through F to H.
What are the positive aspects of this design? Primarily that all processing is handled on the two spine routers—the top of rack switches don’t need to keep any sort of routing table, nor do any IP lookups. This means you can use very inexpensive devices for your ToR. In brownfield deployments, so long as the existing ToR devices can switch based on MAC addresses, existing hardware can be used.
This design also centralizes almost all aspects of network configuration and management on the spine routers. Continue reading
Most network engineers don’t spend a lot of time thinking about their supply chain—you must call your favorite vendor, order, and a few weeks later the hardware shows up on your loading dock. It’s not so simple any more. If you disaggregate, you need to manage your software and hardware supply chains separately. You need to think about security in your supply chain—is that software package backdoored? Moving to the cloud might seem to solve these problems, but they don’t. Even virtual networks have physical limits.
Listen in as Mike Bushong, Brooks Westbrook, Eyvonne Sharp, Tom Ammon, and Russ White discuss supply chain diversity and security.
We have the twelve truths of networking, and possibly Akin’s Laws, but is there a set of rules for network design? I couldn’t find one, so I decided to create one, containing 18 laws I’ve listed below.
Russ’ Rules of Network Design
I’m teaching my troubleshooting webinar this Friday. I’ve revamped the slides entirely, so this will likely be a big change for anyone who’s attended previous versions of this. Three hours, 109 slides, and interaction through the chat window … all to develop some really good skills in how to troubleshoot. For those who are curious, I was taught formal troubleshooting skills in my early life in electronics, learning my lessons on ILS, RADAR, and radio systems of various kinds. This webinar is my adaptation of those skills for network engineers.
Two things have been top of mind for those who watch the ‘net and global Internet policy—the increasing number of widespread outages, and the logical and physical centralization of the ‘net. How do these things relate to one another? Alban Kwan joins us to discuss the relationship between centralization and widespread outages. You can read Alban’s article on the topic here.
Drones are becoming—and in many cases have already become—an everyday part of our lives. Drones are used in warfare, delivery services, photography, and recreation. One of the problems facing the world of drones, however, is the strong tie-in between the controller and the drone; this proprietary link limits innovation and reduces the information available to public officials to manage traffic, and even to protect the privacy of drone operators. The DRIP working group is building protocols designed to standardize the drone-to-controller interface, advancing the state of the art in drones and opening up the field for innovation. Stuart Card joins Alvaro Retana and Russ White to discuss DRIP.
Off-topic post for today …
In the battle between marketing and security, marketing always wins. This topic came to mind after reading an article on using email aliases to control your email—
One of the most basic things you can do to increase your security against phishing attacks is to have two email addresses, one you give to financial institutions and another one you give to “everyone else.” It would be nice to have a third for newsletters and marketing, but this won’t work in the real world. Why?
Because it’s very rare to find a company that will keep two email addresses on file for you, one for “business” and another for “marketing.” To give specific examples—my mortgage company sends me both marketing messages in the form of a “newsletter” as well as information about mortgage activity. They only keep one email address on file, Continue reading
Project AI+Compassion just interviewed Heidi Roizen about compassion in IT; it’s worth listening to. From the show notes—
Language is deeply contextual—one of my favorite sayings from the theological world is if you take the text out of its context, you are just left with the con. What does context have to do with development and operations, though? Can there be low and high context situations in the daily life of building and running systems? Thomas Limoncelli joins Tom Ammon and Russ White to discuss the idea of low context devops, and the larger issue of context in managing projects and teams, on this episode of the Hedge.
Everyone is aware that it always takes longer to find a problem in a network than it should. Moving through the troubleshooting process often feels like swimming in molasses—you’re pulling hard, and progress is being made, but never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.
It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.
Why does it always take too long? Continue reading
It often seems like the IETF is losing steam—building standards, particularly as large cloud-scale companies a reducing their participation in standards bodies and deploying whatever works for them. Given these changes, what is the future of standards bodies like the IETF? Mark Nottingham joins Tom Ammon and Russ White in a broad-ranging discussion around this topic.
My article on Internet centralization just published over at The Public Discourse—
We’ve all been told agile is better … but as anyone who’s listened here long enough knows, if you haven’t found the tradeoffs, you haven’t looked hard enough. What is agile better for? Are there time when agile is better, and times when more traditional project management processes are better? Mike Bushong joins Tom Ammon, Eyvonne Sharp, and Russ White on this, the 95th episode of the Hedge, to discuss his experience with implementing agile, where it works, and where it doesn’t.