LGCVSep 10, 2012

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

arXiv:1209.1960v11208 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the sensitivity of K-means to initialization for practitioners, but it is incremental as it focuses on comparing existing methods.

The study compared eight efficient initialization methods for K-means clustering on diverse datasets, finding that popular methods often perform poorly and identifying strong alternatives.

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes