DSLGApr 6, 2023

Parameterized Approximation Schemes for Clustering with General Norm Objectives

arXiv:2304.03146v122 citationsh-index: 48
Originality Incremental advance
AI Analysis

This work solves the challenge of developing general approximation algorithms for clustering, benefiting researchers and practitioners in machine learning and algorithms, though it is incremental as it builds on existing EPAS frameworks.

The paper tackles the problem of designing efficient parameterized approximation schemes (EPAS) for a wide range of k-clustering objectives and metric spaces, resulting in a unified algorithm that provides EPASes for over ten clustering problems, including k-means, k-center, and robust k-median, across various metric spaces like high-dimensional Euclidean and bounded doubling dimension.

This paper considers the well-studied algorithmic regime of designing a $(1+ε)$-approximation algorithm for a $k$-clustering problem that runs in time $f(k,ε)poly(n)$ (sometimes called an efficient parameterized approximation scheme or EPAS for short). Notable results of this kind include EPASes in the high-dimensional Euclidean setting for $k$-center [Badŏiu, Har-Peled, Indyk; STOC'02] as well as $k$-median, and $k$-means [Kumar, Sabharwal, Sen; J. ACM 2010]. However, existing EPASes handle only basic objectives (such as $k$-center, $k$-median, and $k$-means) and are tailored to the specific objective and metric space. Our main contribution is a clean and simple EPAS that settles more than ten clustering problems (across multiple well-studied objectives as well as metric spaces) and unifies well-known EPASes. Our algorithm gives EPASes for a large variety of clustering objectives (for example, $k$-means, $k$-center, $k$-median, priority $k$-center, $\ell$-centrum, ordered $k$-median, socially fair $k$-median aka robust $k$-median, or more generally monotone norm $k$-clustering) and metric spaces (for example, continuous high-dimensional Euclidean spaces, metrics of bounded doubling dimension, bounded treewidth metrics, and planar metrics). Key to our approach is a new concept that we call bounded $ε$-scatter dimension--an intrinsic complexity measure of a metric space that is a relaxation of the standard notion of bounded doubling dimension. Our main technical result shows that two conditions are essentially sufficient for our algorithm to yield an EPAS on the input metric $M$ for any clustering objective: (i) The objective is described by a monotone (not necessarily symmetric!) norm, and (ii) the $ε$-scatter dimension of $M$ is upper bounded by a function of $ε$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes