LGNov 22, 2022

Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm

arXiv:2211.12271v35.847 citationsh-index: 44Has Code

Originality Incremental advance

AI Analysis

This work addresses the computational bottleneck in deterministic clustering for data scientists, but it is incremental as it builds on existing k-means++ methods.

The paper tackles the high computational cost of the global k-means algorithm by proposing global k-means++, which reduces computational load while maintaining clustering quality akin to global k-means, yielding very satisfactory results in terms of quality and speed on benchmark datasets.

The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a deterministic algorithm proposed to tackle the random initialization problem of k-means but its well-known that requires high computational cost. It partitions the data to $K$ clusters by solving all $k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$ cluster problem, the method executes the $k$-means algorithm $N$ times, where $N$ is the number of datapoints. In this paper, we propose the \emph{global $k$-means\texttt{++}} clustering algorithm, which is an effective way of acquiring quality clustering solutions akin to those of global $k$-means with a reduced computational load. This is achieved by exploiting the center selection probability that is effectively used in the $k$-means\texttt{++} algorithm. The proposed method has been tested and compared in various benchmark datasets yielding very satisfactory results in terms of clustering quality and execution speed.

View on arXiv PDF Code

Similar