DSDBCOJun 1

SplineSketch: Even More Accurate Quantiles with Error Guarantees

arXiv:2504.0120614.91 citations
Predicted impact top 78% in DS · last 90 daysOriginality Highly original
AI Analysis

For practitioners needing both accuracy and theoretical guarantees in streaming quantile estimation, SplineSketch offers a practical solution that bridges the gap between theory and practice.

SplineSketch provides near-optimal theoretical guarantees for quantile estimation with uniformly bounded rank error, while outperforming the widely used t-digest by a factor of 2-20 on synthetic and real-world datasets.

Space-efficient streaming estimation of quantiles in massive datasets is a fundamental problem with numerous applications in data monitoring and analysis. While theoretical research led to optimal algorithms, such as the Greenwald-Khanna algorithm or the KLL sketch, practitioners often use other sketches that perform significantly better in practice but lack theoretical guarantees. Most notably, the widely used $t$-digest has unbounded worst-case error. In this paper, we seek to get the best of both worlds. We present a new quantile summary, SplineSketch, for numeric data, offering near-optimal theoretical guarantees, namely uniformly bounded rank error, and outperforming $t$-digest by a factor of 2-20 on a range of synthetic and real-world datasets. To achieve such performance, we develop a novel approach that maintains a dynamic subdivision of the input range into buckets while fitting the input distribution using monotone cubic spline interpolation. The core challenge is implementing this method in a space-efficient manner while ensuring strong worst-case guarantees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes