DSMar 29

Deterministic $k$-Median Clustering in Near-Optimal Time

arXiv:2504.151154.2h-index: 7

Predicted impact top 93% in DS · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in clustering and approximation algorithms, this work narrows the gap between randomized and deterministic k-median algorithms, though the improvement is incremental over prior deterministic results.

The paper addresses the deterministic k-median clustering problem, providing an algorithm that achieves an O(log(n/k))-approximation in near-optimal Õ(nk) time, and establishes a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of Ω(log n/(log k + log log n)).

The metric $k$-median problem is a textbook clustering problem. As input, we are given a metric space $V$ of size $n$ and an integer $k$, and our task is to find a subset $S \subseteq V$ of at most $k$ `centers' that minimizes the total distance from each point in $V$ to its nearest center in $S$. Mettu and Plaxton [UAI'02] gave a randomized algorithm for $k$-median that computes a $O(1)$-approximation in $\tilde O(nk)$ time. They also showed that any algorithm for this problem with a bounded approximation ratio must have a running time of $Î©(nk)$. Thus, the running time of their algorithm is optimal up to polylogarithmic factors. For deterministic $k$-median, Guha et al.~[FOCS'00] gave an algorithm that computes a $\text{poly}(\log (n/k))$-approximation in $\tilde O(nk)$ time, where the degree of the polynomial in the approximation is unspecified. To the best of our knowledge, this remains the state-of-the-art approximation of any deterministic $k$-median algorithm with this running time. This leads us to the following natural question: What is the best approximation of a deterministic $k$-median algorithm with near-optimal running time? We make progress in answering this question by giving a deterministic algorithm that computes a $O(\log(n/k))$-approximation in $\tilde O(nk)$ time. We also provide a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of $Î©(\log n/(\log k + \log \log n))$, establishing a gap between the randomized and deterministic settings for $k$-median.

View on arXiv PDF

Similar