DSCGLGMLOct 28, 2025

Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

arXiv:2510.24621v1h-index: 1
Originality Highly original
AI Analysis

This provides more efficient coreset construction for robust geometric median problems, which is important for handling outliers in clustering and data summarization tasks, representing a significant theoretical improvement over prior methods.

The paper tackles the robust geometric median problem by constructing coresets that approximate the robust cost for all centers with multiplicative error ε, eliminating the O(m) dependency on outlier count m present in prior work. For d-dimensional Euclidean space, they achieve coreset size Õ(ε⁻²·min{ε⁻²,d}) when n≥4m, and for d=1, they obtain optimal size Õ(ε⁻¹/² + (m/n)ε⁻¹), with empirical results showing improved size-accuracy tradeoffs and runtime.

We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $\varepsilon$. Given an outlier count $m$, we construct a coreset of size $\tilde{O}(\varepsilon^{-2} \cdot \min\{\varepsilon^{-2}, d\})$ when $n \geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022 & 2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $\tildeΘ(\varepsilon^{-1/2} + \frac{m}{n} \varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes