ML AI LG MEJun 14, 2021

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Feng Liu, Wenkai Xu, Jie Lu, Danica J. Sutherland

arXiv:2106.07636v216.028 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a practical problem in statistics and machine learning for scenarios with small sample sizes, though it is incremental as it builds on existing kernel-based methods.

The paper tackles the challenge of performing two-sample tests with limited data by introducing meta two-sample testing (M2ST), which uses auxiliary data from related tasks to quickly learn effective kernels, resulting in improved performance over direct learning from scarce observations.

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and a more tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.

View on arXiv PDF Code

Similar