STMLJan 11, 2021

Smooth $p$-Wasserstein Distance: Structure, Empirical Approximation, and Statistical Applications

arXiv:2101.04039v341 citations
AI Analysis

This provides a scalable method for statistical tasks like two-sample testing and estimation in high-dimensional machine learning and statistics, though it is incremental as it builds on existing smoothing frameworks.

The paper tackles the curse of dimensionality in estimating Wasserstein distances by introducing a Gaussian-smoothed version, proving it achieves a parametric convergence rate of n^{-1/2} compared to the slower n^{-1/d} rate for the unsmoothed version in high dimensions.

Discrepancy measures between probability distributions, often termed statistical distances, are ubiquitous in probability theory, statistics and machine learning. To combat the curse of dimensionality when estimating these distances from data, recent work has proposed smoothing out local irregularities in the measured distributions via convolution with a Gaussian kernel. Motivated by the scalability of this framework to high dimensions, we investigate the structural and statistical behavior of the Gaussian-smoothed $p$-Wasserstein distance $\mathsf{W}_p^{(σ)}$, for arbitrary $p\geq 1$. After establishing basic metric and topological properties of $\mathsf{W}_p^{(σ)}$, we explore the asymptotic statistical behavior of $\mathsf{W}_p^{(σ)}(\hatμ_n,μ)$, where $\hatμ_n$ is the empirical distribution of $n$ independent observations from $μ$. We prove that $\mathsf{W}_p^{(σ)}$ enjoys a parametric empirical convergence rate of $n^{-1/2}$, which contrasts the $n^{-1/d}$ rate for unsmoothed $\mathsf{W}_p$ when $d \geq 3$. Our proof relies on controlling $\mathsf{W}_p^{(σ)}$ by a $p$th-order smooth Sobolev distance $\mathsf{d}_p^{(σ)}$ and deriving the limit distribution of $\sqrt{n}\,\mathsf{d}_p^{(σ)}(\hatμ_n,μ)$, for all dimensions $d$. As applications, we provide asymptotic guarantees for two-sample testing and minimum distance estimation using $\mathsf{W}_p^{(σ)}$, with experiments for $p=2$ using a maximum mean discrepancy formulation of $\mathsf{d}_2^{(σ)}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes