Robust Distribution Learning with Local and Global Adversarial Corruptions
This addresses robust distribution learning for machine learning and statistics in adversarial environments, offering a method that interpolates between mean and full distribution estimation, though it is incremental in combining local and global corruptions.
The paper tackles the problem of learning a distribution from samples that are corrupted both globally (by arbitrary modifications) and locally (by bounded perturbations), aiming to minimize Wasserstein distance to the true distribution across all orthogonal projections. The result is an efficient algorithm with error bounded by √(εk) + ρ + Õ(d√k n^{-1/(k∨2)}), which is minimax optimal up to a sub-optimality factor.
We consider learning in an adversarial environment, where an $\varepsilon$-fraction of samples from a distribution $P$ are arbitrarily modified (global corruptions) and the remaining perturbations have average magnitude bounded by $ρ$ (local corruptions). Given access to $n$ such corrupted samples, we seek a computationally efficient estimator $\hat{P}_n$ that minimizes the Wasserstein distance $\mathsf{W}_1(\hat{P}_n,P)$. In fact, we attack the fine-grained task of minimizing $\mathsf{W}_1(Π_\# \hat{P}_n, Π_\# P)$ for all orthogonal projections $Π\in \mathbb{R}^{d \times d}$, with performance scaling with $\mathrm{rank}(Π) = k$. This allows us to account simultaneously for mean estimation ($k=1$), distribution estimation ($k=d$), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm with error bounded by $\sqrt{\varepsilon k} + ρ+ \tilde{O}(d\sqrt{k}n^{-1/(k \lor 2)})$ when $P$ has bounded covariance. This guarantee holds uniformly in $k$ and is minimax optimal up to the sub-optimality of the plug-in estimator when $ρ= \varepsilon = 0$. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.