LGDSMay 8

Simple KNN-Based Outlier Detection Achieves Robust Clustering

arXiv:2605.0713025.9
Predicted impact top 77% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners needing robust clustering, this work provides a simple, efficient heuristic that matches or exceeds the performance of more sophisticated methods.

The paper shows that a simple KNN-based outlier detection method achieves robust k-Means clustering with constant-factor approximation guarantees under practical assumptions, outperforming or matching more complex algorithms on real-world datasets.

Being robust to the presence of outliers is crucial for applying clustering algorithms in practice. In the $\textit{robust $k$-Means}$ problem (i.e., $k$-Means with outliers), the goal is to remove $z$ outliers and minimize the $k$-Means cost on the remaining points. Despite the close connection between robust $k$-Means and outlier detection, both theoretical and empirical understanding of the effectiveness of $\textit{classic outlier detection heuristics}$ for robust $k$-Means remains limited. In this paper, we prove that under a practical assumption on the optimal cluster sizes, simply removing points with large $K$-Nearest-Neighbor distances achieves performance comparable to prior work in terms of approximation guarantees: it yields a constant-factor reduction from robust $k$-Means to standard $k$-Means, without introducing additional centers or discarding extra outliers, as is commonly required by existing approaches. Empirically, experiments on real-world datasets show that our method outperforms or matches several more sophisticated algorithms in terms of clustering cost and runtime. These results demonstrate that simple KNN-based heuristics can be surprisingly effective for robust clustering, highlighting new opportunities to bridge techniques from outlier detection and clustering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes