LGDSITSTOct 28, 2017

Wasserstein Identity Testing

arXiv:1710.10457v11 citations
Originality Highly original
AI Analysis

This addresses the problem of high sample requirements in identity testing for large or continuous distributions, offering improved efficiency for statisticians and machine learning practitioners.

The paper tackles identity testing under Wasserstein distance for large or continuous supports, achieving nearly optimal worst-case sample complexity and nearly instance-optimal complexity for distributions satisfying a doubling condition.

Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$-distance. However, when the support is very large or even continuous, testing under $L_1$-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worst-case sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the so-called "Doubling Condition", we provide nearly instance-optimal sample complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes