DSLGSTMLAug 18, 2020

Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

arXiv:2008.08071v57 citations
AI Analysis

This work addresses robust estimation for incomplete data, extending prior efficient methods to more practical scenarios.

The paper tackles robust mean estimation from incomplete data with arbitrary outliers, achieving dimension-independent error guarantees in nearly-linear time.

We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes