Nearly Optimal Clustering Risk Bounds for Kernel K-Means
This work addresses the need for better theoretical guarantees in kernel-based clustering methods, which is important for researchers and practitioners in machine learning, though it appears incremental as it builds on existing risk analyses.
The paper tackles the problem of improving statistical risk bounds for kernel k-means clustering, achieving a nearly optimal excess clustering risk bound that substantially improves state-of-the-art results, and shows that Nyström approximations with Ω(√(nk)) landmark points achieve the same statistical accuracy as exact kernel k-means.
In this paper, we study the statistical properties of kernel $k$-means and obtain a nearly optimal excess clustering risk bound, substantially improving the state-of-art bounds in the existing clustering risk analyses. We further analyze the statistical effect of computational approximations of the Nyström kernel $k$-means, and prove that it achieves the same statistical accuracy as the exact kernel $k$-means considering only $Ω(\sqrt{nk})$ Nyström landmark points. To the best of our knowledge, such sharp excess clustering risk bounds for kernel (or approximate kernel) $k$-means have never been proposed before.