Moment-based quantile sketches for efficient high cardinality aggregation queries
Moment-based quantile sketches for efficient high cardinality aggregation queries Gan et al., VLDB’18
Today we’re temporarily pausing our tour through some of the OSDI’18 papers in order to look at a great sketch-based data structure for quantile queries over high-cardinality aggregates.
That’s a bit of a mouthful so let’s jump straight into an example of the problem at hand. Say you have telemetry data from millions of heterogenous mobile devices running your app. Each device tracks multiple metrics such as request latency and memory usage, and is associated with dimensional metadata (categorical variables) such as application version and hardware model.
In applications such as A/B testing, exploratory data analysis, and operations monitoring, analysts perform aggregation queries to understand how specific user cohorts, device types, and feature flags are behaving.
We want to be able to ask questions like “what’s the 99%-ile latency over the last two weeks for v8.2 of the app?”
SELECT percentile(latency, 99) FROM requests WHERE time > date_sub(curdate(), 2 WEEK) AND app_version = "v8.2"
As well as threshold queries such as “what combinations of app version and hardware platform have a 99th percentile latency exceeding 100ms?”
SELECT app_version, hw_model, PERCENTILE(latency, Continue reading