LGAIMSMLMay 6, 2021

Exact Acceleration of K-Means++ and K-Means$\|$

arXiv:2105.02936v16 citations
Originality Incremental advance
AI Analysis

This provides faster initialization for K-means clustering, a widely used machine learning technique, but is incremental as it optimizes existing methods without changing their algorithmic equivalence.

The paper tackled the problem of accelerating K-Means++ and K-Means|| seed selection algorithms by developing specialized pruning strategies and a dynamic priority queue, resulting in up to a 17x speedup for K-Means++ and a 551x speedup for K-Means|| with over 500x reduction in distance computations.

K-Means++ and its distributed variant K-Means$\|$ have become de facto tools for selecting the initial seeds of K-means. While alternatives have been developed, the effectiveness, ease of implementation, and theoretical grounding of the K-means++ and $\|$ methods have made them difficult to "best" from a holistic perspective. By considering the limited opportunities within seed selection to perform pruning, we develop specialized triangle inequality pruning strategies and a dynamic priority queue to show the first acceleration of K-Means++ and K-Means$\|$ that is faster in run-time while being algorithmicly equivalent. For both algorithms we are able to reduce distance computations by over $500\times$. For K-means++ this results in up to a 17$\times$ speedup in run-time and a $551\times$ speedup for K-means$\|$. We achieve this with simple, but carefully chosen, modifications to known techniques which makes it easy to integrate our approach into existing implementations of these algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes