LGMLNov 8, 2013

Moment-based Uniform Deviation Bounds for $k$-means and Friends

arXiv:1311.1903v125 citations
Originality Incremental advance
AI Analysis

This work addresses theoretical guarantees for clustering algorithms, offering uniform deviation bounds that are incremental but provide new insights for distributions with moment constraints.

The paper tackles the problem of bounding the difference between sample and distribution costs for k-means clustering when fitting k centers to m points from distributions with bounded moments, showing the deviation decays as m^{min{-1/4, -1/2+2/p}}. It also extends this analysis to a soft clustering variant and provides refined rates for structured instances.

Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with $p\geq 4$ bounded moments; in particular, the difference between the sample cost and distribution cost decays with $m$ and $p$ as $m^{\min\{-1/4, -1/2+2/p\}}$. The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of $k$-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for $k$-means instances possessing some cluster structure.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes