I’ve reformatted and rebuilt my network troubleshooting live training for 2023, and am teaching it on the 26th of January (in three weeks). You can register at Safari Books Online. From the site:
The first way to troubleshoot faster is to not troubleshoot at all, or to build resilient networks. The first section of this class considers the nature of resilience, and how design tradeoffs result in different levels of resilience. The class then moves into a theoretical understanding of failures, how network resilience is measured, and how the Mean Time to Repair (MTTR) relates to human and machine-driven factors. One of these factors is the unintended consequences arising from abstractions, covered in the next section of the class.
The class then moves into troubleshooting proper, examining the half-split formal troubleshooting method and how it can be combined with more intuitive methods. This section also examines how network models can be used to guide the troubleshooting process. The class then covers two examples of troubleshooting reachability problems in a small network, and considers using ChaptGPT and other LLMs in the troubleshooting process. A third, more complex example is then covered in a data center fabric.
A short section on proving Continue reading
I’ve just finished a seven-part series over at Packets Pushers about the process of writing and publishing an RFC. Even if you don’t ever plan to write a draft or participate in the IETF, this series will give you a better idea of the work that goes into creating new standards and IETF documents.
The working group chairs asked if your draft should become a working group item, and the consensus was to accept! It might seem like your draft is home free Continue reading
As we reach the end of what has been a hard two-year stretch for what seems like the entire world, Ethan Banks joins Tom, Eyvonne, and Russ to talk about the importance of taking care of yourself. In the midst of radical changes, you can apply self-discipline to make your little part of the world a better place by keeping yourself sane, fit, and well-rested.
As we close out 2023, some random observations about engineering, culture, and life.
Network engineering needs help. I am hearing, from all over the place, that network engineering is “not cool.” There is a dearth of students entering the pipeline. College programs are struggling, and many organizations are struggling with a lack of engineering talent—in fact, I would guess the most common reason for companies to move to “the cloud” is because they cannot find anyone who knows how to build an operate a network any longer.
It probably didn’t help that for the last few years many “thought leaders” in the network engineering space have been saying there is no future in network engineering. It also doesn’t help that network engineering training has become stilted and … boring. Coders are off talking about how to solve problems. Robotics folks are working on cool projects that solve problems.
Network engineers are being taught how to spend less money and told to “find another career.”
I don’t know how we think we can sustain a healthy world of IT without network engineers.
And yes, I know there are folks who think networking problems are simple, easy enough to solve Continue reading
For this month’s roundtable, Eyvonne, Tom, and I return to Addresses to Engineering Students by Harrington and Waddell. This book, published in 1912, is a “product of its time,” and hence deserves some trigger warnings. But it is also interesting to see how advice given to engineering students over 100 years ago holds up for today. Have engineering challenges, and the engineering life, changed all that much? What kinds of advice stand the test of time, what kinds do not?
On the 26th of January, I’ll be teaching a webinar over at Safari Books Online (subscription service) called Modern Network Troubleshooting. From the blurb:
The first section of this class considers the nature of resilience, and how design tradeoffs result in different levels of resilience. The class then moves into a theoretical understanding of failures, how network resilience is measured, and how the Mean Time to Repair (MTTR) relates to human and machine-driven factors. One of these factors is the unintended consequences arising from abstractions, covered in the next section of the class.
The class then moves into troubleshooting proper, examining the half-split formal troubleshooting method and how it can be combined with more intuitive methods. This section also examines how network models can be used to guide the troubleshooting process. The class then covers two examples of troubleshooting reachability problems in a small network, and considers using ChaptGPT and other LLMs in the troubleshooting process. A third, more complex example is then covered in a data center fabric.
Terry Slattery joins Tom and Russ to continue the conversation on network automation—and why networks are not as automated as they should be. This is part one of a two-part series; the first part of this conversation was posted as episode 203.
Bad queries tend to propagate to the root zone due to the hierarchical nature of DNS, so studying traffic at a root server can provide key insights into overall network usage.
This blog covers an interesting case of suspected abuse in a gTLD registry between February and April 2023.
Gartner has raised the specter of departments outside of tech running their own IT Continue reading
Terry Slattery joins Tom and Russ to continue the conversation on network automation—and why networks are not as automated as they should be. This is part one of a two-part series; the second part will be published in two weeks as Hedge episode 204.
How is the Internet governed? Who sets the rules for the Internet, civil society, and government control? How much input should techies have, and how much should government control things? These are questions we don’t often ask, and yet are crucial to building and operating networks connected to the global Internet. George Michaelson joins Toms and Russ to talk about Internet governance—including contrary views of where things should be versus where they are.
I occasionally write over at Mind Matters on topics “other than technical.” Here are my two latest posts over there.
Running a little late on cross posting stuff from Packet Pushers … but I suppose better late than never.
It’s time to gather round the hedge and discuss whatever Eyvonne, Tom, and Russ find interesting! In this episode we discuss business logic vulnerabilities, and how we often forget to think outside the box to understand the attack surfaces that matter. We also discuss upcoming network speed increases like Wi-Fi 7 and 800G Ethernet. Do we really need these speeds, or are we just getting caught up in a hype cycle?
We’ve been on a long streak of discussions about automation, why it works, why it isn’t working, and what the networking industry can do about it. For this episode, we’re joined by the indubitable Ethan Banks. If you don’t think there’s anything left to say, you’ve not yet listened to Ethan!
Automation is a big topic–folks had a lot of feedback on our first couple of Hedge episodes on the topic. We return to automation in this episode of the Hedge with Carl Buchmann to discuss one effort at unifying automation with humble beginnings.