Russ

Author Archives: Russ

Hedge 101: In Situ OAM

Understanding the flow of a packet is difficult in modern networks, particularly data center fabrics with their wide fanout and high ECMP counts. At the same time, solving this problem is becoming increasingly important as quality of experience becomes the dominant measure of the network. A number of vendor-specific solutions are being developed to solve this problem. In this episode of the Hedge, Frank Brockners and Shwetha Bhandari join Alvaro Retana and Russ White to discuss the in-situ OAM work currently in progress in the IPPM WG of the IETF.

download

Thoughts on the Collapsed Spine

One of the designs I’ve been encountering a lot of recently is a “collapsed spine” data center network, as shown in the illustration below.

In this design, and B are spine routers, while C-F are top of rack switches. The terminology is important here, because C-F are just switches—they don’t route packets. When G sends a packet to H, the packet is switched by C to A, which then routes the packet towards F, which then switches the packet towards H. C and F do not perform an IP lookup, just a MAC address lookup. A and B are responsible for setting the correct next hop MAC address to forward packets through F to H.
What are the positive aspects of this design? Primarily that all processing is handled on the two spine routers—the top of rack switches don’t need to keep any sort of routing table, nor do any IP lookups. This means you can use very inexpensive devices for your ToR. In brownfield deployments, so long as the existing ToR devices can switch based on MAC addresses, existing hardware can be used.

This design also centralizes almost all aspects of network configuration and management on the spine routers. Continue reading

Hedge 100: Supply Chain Diversity with Brooks Westbrook and Mike Bushong

Most network engineers don’t spend a lot of time thinking about their supply chain—you must call your favorite vendor, order, and a few weeks later the hardware shows up on your loading dock. It’s not so simple any more. If you disaggregate, you need to manage your software and hardware supply chains separately. You need to think about security in your supply chain—is that software package backdoored? Moving to the cloud might seem to solve these problems, but they don’t. Even virtual networks have physical limits.

Listen in as Mike Bushong, Brooks Westbrook, Eyvonne Sharp, Tom Ammon, and Russ White discuss supply chain diversity and security.

download

Russ’ Rules of Network Design

We have the twelve truths of networking, and possibly Akin’s Laws, but is there a set of rules for network design? I couldn’t find one, so I decided to create one, containing 18 laws I’ve listed below.

Russ’ Rules of Network Design

  1. If you haven’t found the tradeoffs, you haven’t looked hard enough.
  2. Design is an iterative process. You probably need one more iteration than you’ve done to get it right.
  3. A design isn’t finished when everything needed is added, it’s finished when everything possible is taken away.
  4. Good design isn’t making it work, it’s making it fail gracefully.
  5. Effective, elegant, efficient. All other orders are incorrect.
  6. Don’t fix blame; fix problems.
  7. Local and global optimization are mutually exclusive.
  8. Reducing state always reduces optimization someplace.
  9. Reducing state always creates interaction surfaces; shallow and narrow interaction surfaces are better than deep and broad ones.
  10. The easiest place to improve or screw up a design is at the interaction surfaces.
  11. The optimum is almost always in the middle someplace; eschew extremes.
  12. Sometimes its just better to start over.
  13. There are a handful of right solutions; there is an infinite array of wrong ones.
  14. You are not immensely smarter than anyone else in Continue reading

Troubleshooting Webinar this Friday

I’m teaching my troubleshooting webinar this Friday. I’ve revamped the slides entirely, so this will likely be a big change for anyone who’s attended previous versions of this. Three hours, 109 slides, and interaction through the chat window … all to develop some really good skills in how to troubleshoot. For those who are curious, I was taught formal troubleshooting skills in my early life in electronics, learning my lessons on ILS, RADAR, and radio systems of various kinds. This webinar is my adaptation of those skills for network engineers.

You can register here.

Hedge 99

Two things have been top of mind for those who watch the ‘net and global Internet policy—the increasing number of widespread outages, and the logical and physical centralization of the ‘net. How do these things relate to one another? Alban Kwan joins us to discuss the relationship between centralization and widespread outages. You can read Alban’s article on the topic here.

download

Hedge 098: DRIP with Stuart Card

Drones are becoming—and in many cases have already become—an everyday part of our lives. Drones are used in warfare, delivery services, photography, and recreation. One of the problems facing the world of drones, however, is the strong tie-in between the controller and the drone; this proprietary link limits innovation and reduces the information available to public officials to manage traffic, and even to protect the privacy of drone operators. The DRIP working group is building protocols designed to standardize the drone-to-controller interface, advancing the state of the art in drones and opening up the field for innovation. Stuart Card joins Alvaro Retana and Russ White to discuss DRIP.

download

Marketing Wins

Off-topic post for today …

In the battle between marketing and security, marketing always wins. This topic came to mind after reading an article on using email aliases to control your email—

For example, if you sign up for a lot of email newsletters, consider doing so with an alias. That way, you can quickly filter the incoming messages sent to that alias—these are probably low-priority, so you can have your provider automatically apply specific labels, mark them as read, or delete them immediately.

One of the most basic things you can do to increase your security against phishing attacks is to have two email addresses, one you give to financial institutions and another one you give to “everyone else.” It would be nice to have a third for newsletters and marketing, but this won’t work in the real world. Why?

Because it’s very rare to find a company that will keep two email addresses on file for you, one for “business” and another for “marketing.” To give specific examples—my mortgage company sends me both marketing messages in the form of a “newsletter” as well as information about mortgage activity. They only keep one email address on file, Continue reading

Hedge 97: Low Context DevOps

Language is deeply contextual—one of my favorite sayings from the theological world is if you take the text out of its context, you are just left with the con. What does context have to do with development and operations, though? Can there be low and high context situations in the daily life of building and running systems? Thomas Limoncelli joins Tom Ammon and Russ White to discuss the idea of low context devops, and the larger issue of context in managing projects and teams, on this episode of the Hedge.

download

It always takes longer than you think

Everyone is aware that it always takes longer to find a problem in a network than it should. Moving through the troubleshooting process often feels like swimming in molasses—you’re pulling hard, and progress is being made, but never fast enough or far enough to get the application back up and running before that crucial deadline. The “swimming in molasses effect” doesn’t end when the problem is found out, either—repairing the problem requires juggling a thousand variables, most of which are unknown, combined with the wit and sagacity of a soothsayer to work with vendors, code releases, and unintended consequences.

It’s enough to make a network engineer want to find a mountain top and assume an all-knowing pose—even if they don’t know anything at all.
The problem of taking longer, though, applies in every area of computer networking. It takes too long for the packet to get there, it takes to long for the routing protocol to converge, it takes too long to support a new application or server. It takes so long to create and validate a network design change that the hardware, software and processes created are obsolete before they are used.

Why does it always take too long? Continue reading

Hedge 96: Mark Nottingham and the Future of Standardization

It often seems like the IETF is losing steam—building standards, particularly as large cloud-scale companies a reducing their participation in standards bodies and deploying whatever works for them. Given these changes, what is the future of standards bodies like the IETF? Mark Nottingham joins Tom Ammon and Russ White in a broad-ranging discussion around this topic.

download

1 19 20 21 22 23 162