LGDSITOCSTMLNov 28, 2022

Sketch-and-solve approaches to k-means clustering by semidefinite programming

arXiv:2211.15744v13 citationsh-index: 33
AI Analysis

This work addresses efficiency and certification in clustering for data scientists, but it is incremental as it builds on existing semidefinite relaxation methods.

The paper tackles the computational bottleneck of the Peng-Wei semidefinite relaxation for k-means clustering by introducing a sketch-and-solve approach, which speeds up the process and provides a high-confidence lower bound on the optimal k-means value without assumptions on the data.

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes