LGApr 28, 2025

Radius-Guided Post-Clustering for Shape-Aware, Scalable Refinement of k-Means Results

arXiv:2504.20293v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for scalable clustering in distributed systems.

The paper tackles the problem of k-means clustering underperforming on non-convex shapes and requiring a pre-specified number of clusters by proposing a radius-guided post-clustering method that merges overlapping clusters after standard k-means, often reconstructing non-convex shapes when k is overestimated, and achieves high accuracy on benchmark datasets with minimal additional computation.

Traditional k-means clustering underperforms on non-convex shapes and requires the number of clusters k to be specified in advance. We propose a simple geometric enhancement: after standard k-means, each cluster center is assigned a radius (the distance to its farthest assigned point), and clusters whose radii overlap are merged. This post-processing step loosens the requirement for exact k: as long as k is overestimated (but not excessively), the method can often reconstruct non-convex shapes through meaningful merges. We also show that this approach supports recursive partitioning: clustering can be performed independently on tiled regions of the feature space, then globally merged, making the method scalable and suitable for distributed systems. Implemented as a lightweight post-processing step atop scikit-learn's k-means, the algorithm performs well on benchmark datasets, achieving high accuracy with minimal additional computation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes