Building an elastic query engine on disaggregated storage
Building an elastic query engine on disaggregated storage, Vuppalapati, NSDI’20
This paper describes the design decisions behind the Snowflake cloud-based data warehouse. As the saying goes, ‘all snowflakes are special’ – but what is it exactly that’s special about this one?
When I think about cloud-native architectures, I think about disaggregation (enabling each resource type to scale independently), fine-grained units of resource allocation (enabling rapid response to changing workload demands, i.e. elasticity), and isolation (keeping tenants apart). Through a study of customer workloads it is revealed that Snowflake scores well on these fronts at a high level, but once you zoom in there are challenges remaining.
This paper presents Snowflake design and implementation along with a discussion on how recent changes in cloud infrastructure (emerging hardware, fine-grained billing, etc.) have altered the many assumptions that guided the design and optimization of the Snowflake system.
From shared-nothing to disaggregation
Traditional data warehouse systems are largely based on shared-nothing designs: persistent data is partitioned across a set of nodes, each responsible for its local data. Analysed from the perspective of cloud-native design this presents a number of issues:
- CPU, memory, storage, and bandwidth resources are all aggregated at each Continue reading







I’ve looked at quite a few pieces of technology in the past few years. Some have addressed massive issues that I had when I was a practicing network engineer. Others have shown me new ways to do things I never thought possible. But one category of technology still baffles me to this day: The technology that assumes greenfield deployment.