Troubleshooting: Basics
It’s 2AM, the network is down, and the CEO is on the phone asking when it is going to be back up—the overnight job crucial to the business opening in the morning has failed, and the company stands to lose millions of dollars if the network is not fixed in the next hour or so. Almost every network engineer has faced this problem at least once in their career, often involving intense bouts of troubleshooting.
And yet—troubleshooting is a skill that is hardly ever taught. There are a number of computer science programs that do include classes in troubleshooting, but these tend to be mostly focused on tools, rather than technique, or focused on practical skill application. I was also trained in troubleshooting many years ago as a young recruit into the United States Air Force—but the training was, again, practical in bent, with very few theoretical components.
Note to readers: I wrote a short piece on troubleshooting here on rule11, but I have taken that piece down and replaced it with this short series on the topic. I did start writing a book on this topic many years ago, but my co-authors and I soon discovered troubleshooting was going Continue reading