Improved Guarantees for k-means++ and k-means++ Parallel
This work provides better theoretical justification for widely used clustering algorithms, which is incremental but important for the machine learning and data science communities.
The paper tackles the k-means clustering problem by analyzing k-means++ and its parallel variant, showing improved approximation guarantees to explain their strong practical performance, and introduces a new variant called Exponential Race k-means++ with similar guarantees.
In this paper, we study k-means++ and k-means++ parallel, the two most popular algorithms for the classic k-means clustering problem. We provide novel analyses and show improved approximation and bi-criteria approximation guarantees for k-means++ and k-means++ parallel. Our results give a better theoretical justification for why these algorithms perform extremely well in practice. We also propose a new variant of k-means++ parallel algorithm (Exponential Race k-means++) that has the same approximation guarantees as k-means++.