DSMay 31

On Sketching Trimmed Statistics

arXiv:2506.0734252.7h-index: 7
Predicted impact top 13% in DS · last 90 daysOriginality Highly original
AI Analysis

For data stream and sublinear algorithm researchers, this provides the first systematic study and efficient algorithms for trimmed statistics, a fundamental tool in robust analytics.

This paper introduces the first sublinear-space algorithms for sketching trimmed statistics of frequency vectors, including top-k and trimmed-k F_p moments. For p in [0,2], they achieve poly(log n/ε)-space when k is large, and identify a sharp threshold for sublinear space feasibility. They also extend to p>2 and improve space bounds for h-index computation.

We study sketching trimmed statistics of a frequency vector, including the $F_p$ moment of the top-$k$ coordinates and of the trimmed-$k$ vector. Despite their natural role in robust analytics, this is the first time these problems have been studied in any sublinear space setting. For $p \in [0,2]$, we obtain $poly(\log n/\varepsilon)$-space algorithms for both tasks when $k$ is moderately large, and for general $k$ we identify a sharp structural threshold that characterizes exactly when sublinear space is possible: in particular, it is actually determined by the ratio between $a_k^2$ and $\|x_{-k}\|_2^2/k$. We extend these results to $p > 2$ and present several applications including algorithms for thresholded $F_p$ estimation and generalized impact indices. Notably, we improve the space bounds of Govindan, Monemizadeh, and Muthukrishnan (PODS 2017) for computing the $h$-index.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes