STLGMLSep 19, 2019

Generalized Resilience and Robust Statistics

arXiv:1909.08755v354 citations
Originality Highly original
AI Analysis

This work addresses the problem of handling diverse data corruptions like systematic errors and missing covariates for statisticians and machine learning practitioners, offering a more flexible framework than traditional outlier-focused methods.

The paper generalizes robust statistics to handle perturbations under any Wasserstein distance, showing that robust estimation is possible when population statistics are resilient to friendly perturbations, simplifying and sometimes improving known results for mean estimation, regression, and covariance estimation.

Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the robust statistics approach to consider perturbations under any Wasserstein distance, and show that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbations. This generalizes a property called resilience previously employed in the special case of mean estimation with outliers. We justify the generalized resilience property by showing that it holds under moment or hypercontractive conditions. Even in the total variation case, these subsume conditions in the literature for mean estimation, regression, and covariance estimation; the resulting analysis simplifies and sometimes improves these known results in both population limit and finite-sample rate. Our robust estimators are based on minimum distance (MD) functionals (Donoho and Liu, 1988), which project onto a set of distributions under a discrepancy related to the perturbation. We present two approaches for designing MD estimators with good finite-sample rates: weakening the discrepancy and expanding the set of distributions. We also present connections to Gao et al. (2019)'s recent analysis of generative adversarial networks for robust estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes