LGNov 22, 2022

Global $k$-means$++$: an effective relaxation of the global $k$-means clustering algorithm

arXiv:2211.12271v347 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in deterministic clustering for data scientists, but it is incremental as it builds on existing k-means++ methods.

The paper tackles the high computational cost of the global k-means algorithm by proposing global k-means++, which reduces computational load while maintaining clustering quality akin to global k-means, yielding very satisfactory results in terms of quality and speed on benchmark datasets.

The $k$-means algorithm is a prevalent clustering method due to its simplicity, effectiveness, and speed. However, its main disadvantage is its high sensitivity to the initial positions of the cluster centers. The global $k$-means is a deterministic algorithm proposed to tackle the random initialization problem of k-means but its well-known that requires high computational cost. It partitions the data to $K$ clusters by solving all $k$-means sub-problems incrementally for all $k=1,\ldots, K$. For each $k$ cluster problem, the method executes the $k$-means algorithm $N$ times, where $N$ is the number of datapoints. In this paper, we propose the \emph{global $k$-means\texttt{++}} clustering algorithm, which is an effective way of acquiring quality clustering solutions akin to those of global $k$-means with a reduced computational load. This is achieved by exploiting the center selection probability that is effectively used in the $k$-means\texttt{++} algorithm. The proposed method has been tested and compared in various benchmark datasets yielding very satisfactory results in terms of clustering quality and execution speed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes