STMLDec 31, 2021

Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders

arXiv:2201.00073v328 citations
Originality Incremental advance
AI Analysis

This work provides theoretical insights for statisticians and data scientists using kernel-based metrics in high-dimensional settings, though it is incremental as it builds on existing kernel test frameworks.

The paper studies the asymptotic behavior of kernel two-sample tests in high dimensions, deriving central limit theorems and performing exact power analysis to reveal the interplay between detectable moment discrepancies and dimension/sample size orders.

Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) using isotropic kernel, including MMD with the Gaussian kernel and the Laplace kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics, based on which we establish the central limit theorem (CLT) under both the null hypothesis and the local and fixed alternatives. The new non-null CLT results allow us to perform asymptotic exact power analysis, which reveals a delicate interplay between the moment discrepancy that can be detected by the kernel two-sample tests and the dimension-and-sample orders. The asymptotic theory is further corroborated through numerical studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes