Interpretable Distribution Features with Maximum Testing Power
This work addresses the need for interpretable distribution comparison in machine learning, offering a linear-time alternative to quadratic-time tests with practical applications in data analysis.
The authors tackled the problem of comparing probability distributions by proposing two semimetrics based on interpretable features that maximize test power, achieving comparable performance to state-of-the-art methods on high-dimensional text and image benchmarks while providing human-interpretable explanations.
Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious and interpretable indication of how and where two distributions differ locally. An empirical estimate of the test power criterion converges with increasing sample size, ensuring the quality of the returned features. In real-world benchmarks on high-dimensional text and image data, linear-time tests using the proposed semimetrics achieve comparable performance to the state-of-the-art quadratic-time maximum mean discrepancy test, while returning human-interpretable features that explain the test results.