MLLGSTMEFeb 19, 2025

A Scalable Nyström-Based Kernel Two-Sample Test with Permutations

arXiv:2502.13570v33 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses a computational bottleneck for statisticians and machine learning practitioners in nonparametric testing, though it is incremental as it builds on existing MMD methods.

The paper tackles the high computational cost of maximum mean discrepancy (MMD) in large-scale two-sample hypothesis testing by proposing a Nyström approximation, achieving a test with finite-sample power bounds that match the minimax optimal rate.

Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due to its flexibility and strong theoretical foundations. However, its use in large-scale scenarios is plagued by high computational costs. In this work, we use a Nyström approximation of the MMD to design a computationally efficient and practical testing algorithm while preserving statistical guarantees. Our main result is a finite-sample bound on the power of the proposed test for distributions that are sufficiently separated with respect to the MMD. The derived separation rate matches the known minimax optimal rate in this setting. We support our findings with a series of numerical experiments, emphasizing applicability to realistic scientific data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes