DB LGOct 13, 2020

On the Efficiency of K-Means Clustering: Evaluation, Optimization, and Algorithm Selection

arXiv:2010.06654v25.14 citations

Originality Incremental advance

AI Analysis

This work addresses the efficiency of k-means clustering for practitioners and researchers, but it is incremental as it builds on existing acceleration methods.

The paper evaluates and optimizes methods for accelerating Lloyd's algorithm for k-means clustering, introducing a unified framework UniK for analysis and achieving improved efficiency through hybridization and automated algorithm selection.

This paper presents a thorough evaluation of the existing methods that accelerate Lloyd's algorithm for fast k-means clustering. To do so, we analyze the pruning mechanisms of existing methods, and summarize their common pipeline into a unified evaluation framework UniK. UniK embraces a class of well-known methods and enables a fine-grained performance breakdown. Within UniK, we thoroughly evaluate the pros and cons of existing methods using multiple performance metrics on a number of datasets. Furthermore, we derive an optimized algorithm over UniK, which effectively hybridizes multiple existing methods for more aggressive pruning. To take this further, we investigate whether the most efficient method for a given clustering task can be automatically selected by machine learning, to benefit practitioners and researchers.

View on arXiv PDF

Similar