MLCVLGSep 12, 2013

Recovery guarantees for exemplar-based clustering

arXiv:1309.3256v241 citations
Originality Incremental advance
AI Analysis

Provides theoretical guarantees for exemplar-based clustering in challenging overlapping scenarios, which is incremental but practically relevant for data analysis.

The paper proves that the linear programming relaxation of k-medoids clustering can correctly separate points from non-overlapping balls with high probability when sample size and separation distance are sufficiently large, even in challenging regimes where simple distance thresholding fails. Numerical evidence suggests recovery works under even more permissive conditions.

For a certain class of distributions, we prove that the linear programming relaxation of $k$-medoids clustering---a variant of $k$-means clustering where means are replaced by exemplars from within the dataset---distinguishes points drawn from nonoverlapping balls with high probability once the number of points drawn and the separation distance between any two balls are sufficiently large. Our results hold in the nontrivial regime where the separation distance is small enough that points drawn from different balls may be closer to each other than points drawn from the same ball; in this case, clustering by thresholding pairwise distances between points can fail. We also exhibit numerical evidence of high-probability recovery in a substantially more permissive regime.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes