MLLGSTMEJun 14, 2023

MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting

arXiv:2306.08777v239 citationsh-index: 66
Originality Highly original
AI Analysis

This work addresses the need for more powerful and efficient kernel-based statistical tests in machine learning, offering a novel approach that avoids data splitting and applies broadly to permutation-based MMD testing.

The authors tackled the problem of improving the power of two-sample tests using the Maximum Mean Discrepancy (MMD) by proposing a method to adaptively combine kernels without data splitting, achieving state-of-the-art performance in synthetic and real-world data comparisons.

We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes