MLLGSPSTCOMay 28, 2020

Robust estimation via generalized quasi-gradients

arXiv:2005.14073v144 citations
Originality Highly original
AI Analysis

This work addresses robust estimation challenges in machine learning and statistics, providing theoretical insights and efficient algorithms for tasks like mean estimation and linear regression, with incremental improvements in breakdown points and error rates.

The paper tackles the problem of why many robust estimation problems are efficiently solvable despite non-convexity by identifying 'generalized quasi-gradients' that guarantee global minimum approximation with low-regret algorithms. For robust mean estimation, it shows stationary points are approximate global minima if corruption level ε < 1/3, achieving breakdown point 1/3, and improves this to the optimal 1/2 with careful initialization and step size, while for linear regression, it improves error from O(√ε) to the optimal O(ε) for small ε.

We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly-used filtering algorithm. For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an {approximate global minimum} if and only if the corruption level $ε< 1/3$. Consequently, any optimization algorithm that aproaches a stationary point yields an efficient robust estimator with breakdown point $1/3$. With careful initialization and step size, we improve this to $1/2$, which is optimal. For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from $O(\sqrtε)$ to the optimal rate of $O(ε)$ for small $ε$ assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point $1/3$ and iteration complexity $\tilde{O}(d/ε^2)$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes