Amazon Aurora: on avoiding distributed consensus for I/Os, commits, and membership changes
Amazon Aurora: on avoiding distributed consensus for I/Os, commits, and membership changes, Verbitski et al., SIGMOD’18
This is a follow-up to the paper we looked at earlier this week on the design of Amazon Aurora. I’m going to assume a level of background knowledge from that work and skip over the parts of this paper that recap those key points. What is new and interesting here are the details of how quorum membership changes are dealt with, the notion of heterogeneous quorum set members, and more detail on the use of consistency points and the redo log.
Changing quorum membership
Managing quorum failures is complex. Traditional mechanisms cause I/O stalls while membership is being changed.
As you may recall though, Aurora is designed for a world with a constant background level of failure. So once a quorum member is suspected faulty we don’t want to have to wait to see if it comes back, but nor do we want throw away the benefits of all the state already present on a node that might in fact come back quite quickly. Aurora’s membership change protocol is designed to support continued processing during the change, to tolerate additional failures while Continue reading
Decibel already invested in two startups: Blameless, a site reliability engineering (SRE) company...
The Kubernetes community does not view security as something tied to specific updates and instead...
The Pentagon is planning for a series of experiments later this year to learn more about...
China’s three mobile operators, China Mobile, China Telecom, and China Unicom, aim to be among...
New research from Ixia and Dimensional Research indicates that while enterprises are rapidly...