LGDSSTMLJun 18, 2020

List-Decodable Mean Estimation via Iterative Multi-Filtering

arXiv:2006.10715v224 citations
Originality Highly original
AI Analysis

This addresses a fundamental challenge in robust statistics for machine learning, providing a viable solution for high-dimensional data with heavy-tailed outliers.

The paper tackles the problem of list-decodable mean estimation for bounded covariance distributions, where an unknown fraction of points are inliers and the rest are outliers, by developing an iterative algorithm that achieves near-optimal error with practical efficiency.

We study the problem of {\em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $\mathbb{R}^d$ with the promise that an unknown $α$-fraction of points in $T$, where $0< α< 1/2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes