Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers
This work addresses robust estimation for incomplete data, extending prior efficient methods to more practical scenarios.
The paper tackles robust mean estimation from incomplete data with arbitrary outliers, achieving dimension-independent error guarantees in nearly-linear time.
We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.