Russ

Author Archives: Russ

Administravia 20170703

Just a short note: I’ve updated the sixty book section of the site with a new plugin designed to keep track of book libraries. Along the way, I’ve added an Amazon affiliate code, so maybe I can buy a cup of hot chocolate and a piece of banana nut bread at some point in the future. ? The look should really be a bit nicer, though, and it is easier to add books to this system than manually adding them as I was doing before.

Remember that the idea of sixty books is not that there are actually 60 books on the list, but rather this is what I read in an average year—and hence what I am challenging you to work up to.

The post Administravia 20170703 appeared first on rule 11 reader.

Random Thoughts on Grey Failures and Scale

I have used the example of increasing paths to the point where the control plane converges more slowly, impacting convergence, hence increasing the Mean Time to Repair, to show that too much redundancy can actually reduce overall network availability. Many engineers I’ve talked to balk at this idea, because it seems hard to believe that adding another link could, in fact, impact routing protocol convergence in such a way. I ran across a paper a while back that provides a different kind of example about the trade-off around redundancy in a network, but I never got around to actually reading the entire paper and trying to figure out how it fits in.

In Gray Failure: The Achilles’ Heel of Cloud-Scale Systems, the authors argue that one of the main problems with building a cloud system is with grey failures—when a router fails only some of the time, or drops (or delays) only some small percentage of the traffic. The example given is—

  • A single service must collect information from many other services on the network to complete a particular operation
  • Each of these information collection operations represent a single transaction carried across the network
  • The more transactions there are, the Continue reading
1 85 86 87 88 89 164