LGMLMay 19, 2020

k-sums: another side of k-means

arXiv:2005.09485v1
AI Analysis

This work addresses clustering efficiency and accuracy for data analysis applications, representing an incremental improvement over existing methods.

The paper revisits k-means clustering by introducing a stochastic minimization procedure that reallocates samples to clusters based on proximity to centroids, leading to faster convergence and better local minima compared to k-means and its variants, with performance improvements demonstrated across various datasets.

In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure stochastic minimization procedure. In each step of the iteration, one sample is tentatively reallocated from one cluster to another. It is moved to another cluster as long as the reallocation allows the sample to be closer to the new centroid. This optimization procedure converges faster to a better local minimum over k-means and many of its variants. This fundamental modification over the k-means loop leads to the redefinition of a family of k-means variants. Moreover, a new target function that minimizes the summation of pairwise distances within clusters is presented. We show that it could be solved under the same stochastic optimization procedure. This minimization procedure built upon two minimization models outperforms k-means and its variants considerably with different settings and on different datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes