On the tightness of an SDP relaxation of k-means
This provides theoretical guarantees for a specific clustering method, but it is incremental as it builds on prior work on SDP relaxations.
The paper tackles the problem of proving exact recovery conditions for an SDP relaxation of k-means clustering under a random data model, showing that it recovers planted clusters with probability 1 - e^{-Ω(n)} when ball centers are sufficiently separated.
Recently, Awasthi et al. introduced an SDP relaxation of the $k$-means problem in $\mathbb R^m$. In this work, we consider a random model for the data points in which $k$ balls of unit radius are deterministically distributed throughout $\mathbb R^m$, and then in each ball, $n$ points are drawn according to a common rotationally invariant probability distribution. For any fixed ball configuration and probability distribution, we prove that the SDP relaxation of the $k$-means problem exactly recovers these planted clusters with probability $1-e^{-Ω(n)}$ provided the distance between any two of the ball centers is $>2+ε$, where $ε$ is an explicit function of the configuration of the ball centers, and can be arbitrarily small when $m$ is large.