DSLGMLDec 4, 2023

Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression

CMU
arXiv:2312.01547v19 citationsh-index: 48NIPS
Originality Incremental advance
AI Analysis

It solves fundamental statistical problems with practical efficiency, offering incremental algorithmic improvements for robust machine learning.

The paper tackles robust mean estimation and linear regression for Gaussian data with Huber contamination, achieving near-optimal sample complexity of O(d/ε^2) and almost linear runtime with error O(ε), improving prior suboptimal results.

We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time algorithms with optimal error guarantees for both of these problems. Specifically, for Gaussian robust mean estimation on $\mathbb{R}^d$ with contamination parameter $ε\in (0, ε_0)$ for a small absolute constant $ε_0$, we give an algorithm with sample complexity $n = \tilde{O}(d/ε^2)$ and almost linear runtime that approximates the target mean within $\ell_2$-error $O(ε)$. This improves on prior work that achieved this error guarantee with polynomially suboptimal sample and time complexity. For robust linear regression, we give the first algorithm with sample complexity $n = \tilde{O}(d/ε^2)$ and almost linear runtime that approximates the target regressor within $\ell_2$-error $O(ε)$. This is the first polynomial sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature. At the technical level, we develop a methodology that yields almost-linear time algorithms for multi-directional filtering that may be of broader interest.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes