DSApr 1

Round-efficient Fully-scalable MPC algorithms for k-Means

Shaofeng H. -C. Jiang, Yaonan Jin, Jianing Lou, Weicheng Wang

arXiv:2604.0095425.7

Predicted impact top 56% in DS · last 90 daysOriginality Incremental advance

AI Analysis

This provides a more efficient algorithm for large-scale clustering in distributed computing, though it is incremental as it builds on existing methods to address round complexity.

The paper tackles the Euclidean k-Means problem in the Massively Parallel Computation (MPC) model, achieving a fully-scalable O((log n / log log n)^2)-approximation in O(1) rounds, which improves upon prior work that required super-constant rounds or had bicriteria guarantees.

We study Euclidean $k$-Means under the Massively Parallel Computation (MPC) model, focusing on the \emph{fully-scalable} setting. Our main result is a fully-scalable $O((\log n/\log\log n)^2)$-approximation in $O(1)$ rounds. Previously, fully-scalable algorithms for $k$-Means either run in super-constant $O(\log\log n \cdot \log\log\log n)$ rounds, albeit with a better $O(1)$-approximation [Cohen-Addad et al., SODA'26], or suffer from bicriteria guarantees [Bhaskara and Wijewardena, ICML'18; Czumaj et al., ICALP'24]. Our algorithm also gives an $O(\log n/\log\log n)$-approximation for $k$-Median, which improves a recent $O(\log n)$-approximation [Goranci et al., SODA'26], and this $o(\log n)$ ratio breaks the fundamental barrier of tree embedding methods used therein. Our main technical contribution is a new variant of the MP algorithm [Mettu and Plaxton, SICOMP'03] that works for general metrics, whose new guarantee is the Lagrangian Multiplier Preserving (LMP) property, which, importantly, holds even under arbitrary distance distortions. Allowing distance distortion is crucial for efficient MPC implementations and useful for efficient algorithm design in general, whereas preserving the LMP property under distance distortion is known to be a significant technical challenge. As a byproduct of our techniques, we also obtain an $O(1)$-approximation to the optimal \emph{value} in $O(1)$ rounds, which conceptually suggests that achieving a true $O(1)$-approximation (for the solution) in $O(1)$ rounds may be a sensible goal for future study.

View on arXiv PDF

Similar