DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees
DDSketch: a fast and fully-mergeable quantile sketch with relative-error guarantees Masson et al., VLDB’19
Datadog handles a ton of metrics – some customers have endpoints generating over 10M points per second! For response times (latencies) reporting a simple metric such as ‘average’ is next to useless. Instead we want to understand what’s happening at different latency percentiles (e.g p99).
The ability to compute quantiles over aggregated metrics has been recognized to be an essential feature of any monitoring system… Given how expensive calculating exact quantiles can be for both storage and network bandwidth, most monitoring system will compress the data into sketches and compute approximate quantiles.
Fortunately there are plenty of quantile sketching algorithms available including the GK-sketch, the t-digest, the HDR histogram, and the Moments sketch that we looked at last year. For reasons we’ll see shortly though, none of those were good enough for Datadog, so they developed their own sketching data structure, DDSketch. Officially in the paper DDSketch stands for ‘Distributed Distribution Sketch’ but that seems a bit of a stretch… surely it’s the ‘Datadog Sketch’ ! A glance at the code repository for the Python implementation confirms my suspicion: there are several references to Continue reading



