LGApr 23

An effective variant of the Hartigan $k$-means algorithm

arXiv:2604.2179845.5

AI Analysis

For practitioners using k-means clustering, this offers a simple tweak to improve results over the already superior Hartigan algorithm.

The paper proposes a minor variation of Hartigan's k-means algorithm that yields an additional 2%-5% improvement in clustering quality, with larger gains for higher dimensions or larger k.

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of $5\%$ -- $10\%$. We point out that a very minor variation of Hartigan's method leads to another $2\%$ -- $5\%$ improvement; the improvement tends to become larger when either dimension or $k$ increase.

View on arXiv PDF

Similar