MLOct 17, 2022
Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein DistancesSloan Nietert, Ritwik Sadhu, Ziv Goldfeld et al. · uw
Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates with explicit dependence of constants on dimension, subject to log-concavity of the population distributions. For robustness, we characterize minimax optimal, dimension-free robust estimation risks, and show an equivalence between robust sliced 1-Wasserstein estimation and robust mean estimation. This enables lifting statistical and algorithmic guarantees available for the latter to the sliced 1-Wasserstein setting. Moving on to computational aspects, we analyze the Monte Carlo estimator for the average-sliced distance, demonstrating that larger dimension can result in faster convergence of the numerical integration error. For the max-sliced distance, we focus on a subgradient-based local optimization algorithm that is frequently used in practice, albeit without formal guarantees, and establish an $O(ε^{-4})$ computational complexity bound for it. Our theory is validated by numerical experiments, which altogether provide a comprehensive quantitative account of the scalability question.
STJul 28, 2021
Limit Distribution Theory for the Smooth 1-Wasserstein Distance with ApplicationsRitwik Sadhu, Ziv Goldfeld, Kengo Kato
The smooth 1-Wasserstein distance (SWD) $W_1^σ$ was recently proposed as a means to mitigate the curse of dimensionality in empirical approximation while preserving the Wasserstein structure. Indeed, SWD exhibits parametric convergence rates and inherits the metric and topological structure of the classic Wasserstein distance. Motivated by the above, this work conducts a thorough statistical study of the SWD, including a high-dimensional limit distribution result for empirical $W_1^σ$, bootstrap consistency, concentration inequalities, and Berry-Esseen type bounds. The derived nondegenerate limit stands in sharp contrast with the classic empirical $W_1$, for which a similar result is known only in the one-dimensional case. We also explore asymptotics and characterize the limit distribution when the smoothing parameter $σ$ is scaled with $n$, converging to $0$ at a sufficiently slow rate. The dimensionality of the sampled distribution enters empirical SWD convergence bounds only through the prefactor (i.e., the constant). We provide a sharp characterization of this prefactor's dependence on the smoothing parameter and the intrinsic dimension. This result is then used to derive new empirical convergence rates for classic $W_1$ in terms of the intrinsic dimension. As applications of the limit distribution theory, we study two-sample testing and minimum distance estimation (MDE) under $W_1^σ$. We establish asymptotic validity of SWD testing, while for MDE, we prove measurability, almost sure convergence, and limit distributions for optimal estimators and their corresponding $W_1^σ$ error. Our results suggest that the SWD is well suited for high-dimensional statistical learning and inference.
STJan 11, 2021
Smooth $p$-Wasserstein Distance: Structure, Empirical Approximation, and Statistical ApplicationsSloan Nietert, Ziv Goldfeld, Kengo Kato
Discrepancy measures between probability distributions, often termed statistical distances, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of this framework to high dimensions, we investigate the structural and statistical behavior of the Gaussian-smoothed $p$-Wasserstein distance $\mathsf{W}_p^{(σ)}$, for arbitrary $p\geq 1$. After establishing basic metric and topological properties of $\mathsf{W}_p^{(σ)}$, we explore the asymptotic statistical behavior of $\mathsf{W}_p^{(σ)}(\hatμ_n,μ)$, where $\hatμ_n$ is the empirical distribution of $n$ independent observations from $μ$. We prove that $\mathsf{W}_p^{(σ)}$ enjoys a parametric empirical convergence rate of $n^{-1/2}$, which contrasts the $n^{-1/d}$ rate for unsmoothed $\mathsf{W}_p$ when $d \geq 3$. Our proof relies on controlling $\mathsf{W}_p^{(σ)}$ by a $p$th-order smooth Sobolev distance $\mathsf{d}_p^{(σ)}$ and deriving the limit distribution of $\sqrt{n}\,\mathsf{d}_p^{(σ)}(\hatμ_n,μ)$, for all dimensions $d$. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation using $\mathsf{W}_p^{(σ)}$, with experiments for $p=2$ using a maximum mean discrepancy formulation of $\mathsf{d}_2^{(σ)}$.