LG MLOct 15, 2012

The Perturbed Variation

arXiv:1210.4006v17 citations

Originality Incremental advance

AI Analysis

This addresses the need for measuring distribution similarity in statistical analysis, but it is incremental as it builds on existing work for exact distribution testing.

The paper tackles the problem of determining if two finite samples come from similar distributions, rather than exactly the same, by introducing a new discrepancy score that optimally perturbs distributions to fit each other. The result includes convergence bounds, hypothesis testing procedures with statistical power demonstrated in simulations, and comparisons on real data showing its capacity to detect similarity.

We introduce a new discrepancy score between two distributions that gives an indication on their similarity. While much research has been done to determine if two samples come from exactly the same distribution, much less research considered the problem of determining if two finite samples come from similar distributions. The new score gives an intuitive interpretation of similarity; it optimally perturbs the distributions so that they best fit each other. The score is defined between distributions, and can be efficiently estimated from samples. We provide convergence bounds of the estimated score, and develop hypothesis testing procedures that test if two data sets come from similar distributions. The statistical power of this procedures is presented in simulations. We also compare the score's capacity to detect similarity with that of other known measures on real data.

View on arXiv PDF

Similar