Why Troubleshooting Is Overrated
This post is the result of a thought I had after someone asked me to describe an interesting problem I’d faced. I think they meant troubleshooting, because that’s how I answered it.
Speak to most network engineers about what they love about the job, and troubleshooting will crop up quite frequently. I’ve got to admit, being able to delve into a complex problem in a high pressure situation with a clock against it more often than not does give me a rush of sorts. The CLI-fu rolls off your fingers if you’ve been on point with your studies, or you’re an experienced engineer, you methodically tick off what the problems could be and there’s a “Eureka” moment where you triumphantly declare the root cause.
But then what?
I don’t mean what’s the solution to the problem. That’s usually obvious. In most cases, the root cause is one of these culprits:
– Poor design. E.g: 1Gb link in a 10Gb path, designing for ECMP and then realising they’ve used BGP to learn a default route and not influenced your metrics, so anything outside your little IGP’s domain is going to be deterministically routed.
– A fault. E.g: Link down Continue reading
