LGFeb 20, 2017

On the Consistency of $k$-means++ algorithm

arXiv:1702.06120v19 citations
Originality Synthesis-oriented
AI Analysis

This result is potentially relevant for practitioners using subsampling to cluster large datasets, though it appears incremental as it extends known sample properties to population consistency.

The paper proves that the expected objective function of the k-means++ algorithm for samples converges to the population expected value, enabling constant-factor approximation for population k-means objectives with increased sample size.

We prove in this paper that the expected value of the objective function of the $k$-means++ algorithm for samples converges to population expected value. As $k$-means++, for samples, provides with constant factor approximation for $k$-means objectives, such an approximation can be achieved for the population with increase of the sample size. This result is of potential practical relevance when one is considering using subsampling when clustering large data sets (large data bases).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes