Improved Approximations for Euclidean $k$-means and $k$-median, via Nested Quasi-Independent Sets
This work addresses high-dimensional clustering problems in data analysis and machine learning, offering incremental improvements in approximation ratios.
The paper tackles the Euclidean k-median and k-means clustering problems by proposing a new primal-dual algorithm, achieving approximation ratios of 2.406 and 5.912, respectively, which improve upon previous results of 2.633 and 6.1291.
Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$-median and $k$-means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward. Our algorithm achieves an approximation ratio of $2.406$ and $5.912$ for Euclidean $k$-median and $k$-means, respectively, improving upon the 2.633 approximation ratio of Ahmadian et al. and the 6.1291 approximation ratio of Grandoni, Ostrovsky, Rabani, Schulman, and Venkat. Our techniques involve a much stronger exploitation of the Euclidean metric than previous work on Euclidean clustering. In addition, we introduce a new method of removing excess centers using a variant of independent sets over graphs that we dub a "nested quasi-independent set". In turn, this technique may be of interest for other optimization problems in Euclidean and $\ell_p$ metric spaces.