Capturing and enhancing in situ system observability for failure detection
Capturing and enhancing in situ system observability for failure detection Huang et al., OSDI’18
The central idea in this paper is simple and brilliant. The place where we have the most relevant information about the health of a process or thread is in the clients that call it. Today the state of the practice is to log and try to recover from a failed call at the client, while a totally separate failure detection infrastructure is responsible for figuring out whether or not things are working as desired. What Panorama does is turn clients into observers and reporters of the components they call, using these observations to determine component health. It works really well!
Panorama can easily integrate with popular distributed systems and detect all 15 real-world gray failures that we reproduced in less than 7s, whereas existing approaches detect only one of them in under 300s.
Panaroma is open source and available at https://github.com/ryanphuang/panorama.
Combating gray failures with multiple observers
Panaroma is primarily design to catch gray failures, in which components and systems offer degraded performance but typically don’t crash-stop. One example of such a failure is a ZooKeeper cluster that could no longer service write Continue reading

The company said it’s using service providers as its SD-WAN sales channel and banking on large managed services deals.
The news comes as major cloud providers including AWS, Microsoft, and IBM are competing for a $10 billion, 10-year cloud contract from the U.S. Department of Defense.



SDxCentral Weekly Wrap for October 12, 2018. Apple and Amazon deny they were the victims of an elaborate spying plan by the Chinese government.
The company will now give IT workers that are transferring to Infosys the option of receiving an exit package if they decline to move to the outsourcing firm.
Netsurion, a newcomer to the SD-WAN market, has found its niche in the market as it builds its SD-WAN as integrated secure connectivity service.
The startup developed a type of composable infrastructure that focuses specifically on computing resources.
CenturyLink becomes an expert Microsoft Azure MSP; Huawei debuts a series of AI chips; Google releases its alert center for security threats.
