CGCCDSMar 10

Almost-Optimal Upper and Lower Bounds for Clustering in Low Dimensional Euclidean Spaces

arXiv:2603.09846v13.6h-index: 13
Predicted impact top 43% in CG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses computational efficiency for clustering algorithms, which is crucial for data analysis and machine learning applications, though it is incremental in improving prior bounds.

The paper tackles the problem of approximating k-median and k-means clustering in low-dimensional Euclidean spaces, improving the running time to near-linear with an exponential dependence on dimension and providing an almost matching lower bound under a computational hypothesis.

The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative points so as to minimize the sum of the distances (resp. sum of squared distances) from each point to its closest representative. Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a $(1+\varepsilon)$-factor approximation in low-dimensional Euclidean metric for both the $k$-median and $k$-means problems in near-linear time $2^{(1/\varepsilon)^{O(d^2)}} n \cdot \text{polylog}(n)$ (where $d$ is the dimension and $n$ is the number of input points). We improve this running time to $2^{\tilde{O}(1/\varepsilon)^{d-1}} \cdot n \cdot \text{polylog}(n)$, and show an almost matching lower bound: under the Gap Exponential Time Hypothesis for 3-SAT, there is no $2^{{o}(1/\varepsilon^{d-1})} n^{O(1)}$ algorithm achieving a $(1+\varepsilon)$-approximation for $k$-means.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes