A Geometric Approach to $k$-means
This addresses the fundamental issue of local optima in k-means clustering for scientific and engineering domains, representing an incremental improvement over existing methods.
The paper tackles the nonconvex optimization problem in k-means clustering by proposing a geometric algorithmic framework to escape undesirable local solutions and recover the global or ground truth clustering, with theoretical justifications and extensive experiments demonstrating its efficacy.
\kmeans clustering is a fundamental problem in many scientific and engineering domains. The optimization problem associated with \kmeans clustering is nonconvex, for which standard algorithms are only guaranteed to find a local optimum. Leveraging the hidden structure of local solutions, we propose a general algorithmic framework for escaping undesirable local solutions and recovering the global solution or the ground truth clustering. This framework consists of iteratively alternating between two steps: (i) detect mis-specified clusters in a local solution, and (ii) improve the local solution by non-local operations. We discuss specific implementation of these steps, and elucidate how the proposed framework unifies many existing variants of \kmeans algorithms through a geometric perspective. We also present two natural variants of the proposed framework, where the initial number of clusters may be over- or under-specified. We provide theoretical justifications and extensive experiments to demonstrate the efficacy of the proposed approach.