STLGJul 19, 2018

Partial recovery bounds for clustering with the relaxed $K$means

arXiv:1807.07547v369 citations
Originality Incremental advance
AI Analysis

This work addresses clustering accuracy problems for researchers in machine learning and statistics, offering incremental improvements in theoretical bounds for specific models.

The paper tackles clustering performance by analyzing the relaxed K-means algorithm in sub-Gaussian Mixture Models and Stochastic Block Models, proving that misclassification error decays exponentially with signal-to-noise ratio and improving upon existing results in these settings.

We investigate the clustering performances of the relaxed $K$means in the setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM). After identifying the appropriate signal-to-noise ratio (SNR), we prove that the misclassification error decay exponentially fast with respect to this SNR. These partial recovery bounds for the relaxed $K$means improve upon results currently known in the sGMM setting. In the SBM setting, applying the relaxed $K$means SDP allows to handle general connection probabilities whereas other SDPs investigated in the literature are restricted to the assortative case (where within group probabilities are larger than between group probabilities). Again, this partial recovery bound complements the state-of-the-art results. All together, these results put forward the versatility of the relaxed $K$means.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes