DSCCCGLGDec 4, 2024

On Approximability of $\ell_2^2$ Min-Sum Clustering

arXiv:2412.03332v24 citationsh-index: 42
AI Analysis

This work establishes fundamental limits on the approximability of $\ell_2^2$ min-sum k-clustering, which is important for researchers and practitioners working on clustering algorithms, by providing the first hardness-of-approximation results.

This paper addresses the $\ell_2^2$ min-sum k-clustering problem, which is NP-hard, by providing the first hardness-of-approximation results. It shows that approximating the objective better than 1.056 is NP-hard, and under the Johnson Coverage Hypothesis, this factor increases to 1.327. Additionally, the paper offers a nearly linear time parameterized PTAS and an approximation algorithm for a learning-augmented setting.

The $\ell_2^2$ min-sum $k$-clustering problem is to partition an input set into clusters $C_1,\ldots,C_k$ to minimize $\sum_{i=1}^k\sum_{p,q\in C_i}\|p-q\|_2^2$. Although $\ell_2^2$ min-sum $k$-clustering is NP-hard, it is not known whether it is NP-hard to approximate $\ell_2^2$ min-sum $k$-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the $\ell_2^2$ min-sum $k$-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than $1.056$ and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a nearly linear time parameterized PTAS for $\ell_2^2$ min-sum $k$-clustering running in time $O\left(n^{1+o(1)}d\cdot \exp((k\cdot\varepsilon^{-1})^{O(1)})\right)$, where $d$ is the underlying dimension of the input dataset. Finally, we consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label $i\in[k]$ for input point, thereby implicitly partitioning the input dataset into $k$ clusters that induce an approximately optimal solution, up to some amount of adversarial error $α\in\left[0,\frac{1}{2}\right)$. We give a polynomial-time algorithm that outputs a $\frac{1+γα}{(1-α)^2}$-approximation to $\ell_2^2$ min-sum $k$-clustering, for a fixed constant $γ>0$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes