MLAILGMEJun 14, 2021

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

arXiv:2106.07636v228 citations
Originality Incremental advance
AI Analysis

This addresses a practical problem in statistics and machine learning for scenarios with small sample sizes, though it is incremental as it builds on existing kernel-based methods.

The paper tackles the challenge of performing two-sample tests with limited data by introducing meta two-sample testing (M2ST), which uses auxiliary data from related tasks to quickly learn effective kernels, resulting in improved performance over direct learning from scarce observations.

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and a more tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes