The post Worth Reading: Bouncing back to private clouds appeared first on 'net work.
Systems are inherently reliable. Until they aren’t. On a long enough timeline, even the most reliable system will eventually fail. How you manage that failure says a lot about the way your build your system or application. So, why is it then that we’re so focused on failing?
No system is infallible. Networks go down. Cloud services get knocked offline. Even Facebook, which represents “the Internet” for a large number of people, has days when it’s unreachable. When we examine these outages, we often find issues at the core of the system that cause services to be unreachable. In the most recent case of Amazon’s cloud system, it was a typo in a script that executed faster than it could be stopped.
It could also be a failure of the system to anticipate increased loads when minor failures happen. If systems aren’t built to take on additional load when the worst happens, you’re going to see bigger outages. That is a particular thorn in the side of large cloud providers like Amazon and Google. It’s also something that network architects need to be aware of when building redundant pathways to handle problems.
Take, for example, Continue reading