MLLGMay 12, 2025

Wasserstein Distributionally Robust Nonparametric Regression

arXiv:2505.07967v23 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses robustness in statistical learning for high-dimensional settings, though it is incremental by extending existing parametric methods to nonparametric frameworks.

The paper tackles nonparametric regression under model uncertainty using Wasserstein distributionally robust optimization, establishing that different Wasserstein distance orders induce distinct regularization types and deriving minimax optimal convergence rates for neural network estimators, with rates like n^{-2β/(d+2β)} up to logarithmic factors.

Wasserstein distributionally robust optimization (WDRO) strengthens statistical learning under model uncertainty by minimizing the local worst-case risk within a prescribed ambiguity set. Although WDRO has been extensively studied in parametric settings, its theoretical properties in nonparametric frameworks remain underexplored. This paper investigates WDRO for nonparametric regression. We first establish a structural distinction based on the order $k$ of the Wasserstein distance, showing that $k=1$ induces Lipschitz-type regularization, whereas $k > 1$ corresponds to gradient-norm regularization. To address model misspecification, we analyze the excess local worst-case risk, deriving non-asymptotic error bounds for estimators constructed using norm-constrained feedforward neural networks. This analysis is supported by new covering number and approximation bounds that simultaneously control both the function and its gradient. The proposed estimator achieves a convergence rate of $n^{-2β/(d+2β)}$ up to logarithmic factors, where $β$ depends on the target's smoothness and network parameters. This rate is shown to be minimax optimal under conditions commonly satisfied in high-dimensional settings. Moreover, these bounds on the excess local worst-case risk imply guarantees on the excess natural risk, ensuring robustness against any distribution within the ambiguity set. We show the framework's generality across regression and classification problems. Simulation studies and an application to the MNIST dataset further illustrate the estimator's robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes