LGJan 27, 2025

Fixed-sized clusters $k$-Means

arXiv:2501.16113v11 citationsh-index: 47
Originality Synthesis-oriented
AI Analysis

This addresses the need for balanced clustering in applications like data partitioning, but it is incremental as it builds on standard k-means with a known optimization method.

The paper tackles the problem of performing k-means clustering with fixed cluster sizes, such as for balanced clustering, by using the Hungarian algorithm in the assignment phase, enabling clustering of datasets with over 5000 points.

We present a $k$-means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O(n^3)$. This enables clustering of datasets of size more than 5000 points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes