ML LG MEAug 11, 2025

Likelihood Ratio Tests by Kernel Gaussian Embedding

Leonardo V. Santoro, Victor M. Panaretos

arXiv:2508.07982v22 citationsh-index: 17

Originality Highly original

AI Analysis

This work addresses the problem of detecting differences between two probability distributions for researchers in statistics and machine learning, offering a novel test with improved performance in challenging settings.

The authors tackled the problem of nonparametric two-sample testing by proposing a kernel-based test that uses combined kernel mean and covariance embeddings to map probability measures to mutually singular Gaussian measures, resulting in a test statistic based on relative entropy. Empirical results showed significant power gains over state-of-the-art methods, especially in high-dimensional and weak-signal scenarios.

We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability measures to mutually singular Gaussian measures on the kernel's RKHS. Leveraging this ``separation of measure phenomenon", we construct a test statistic based on the relative entropy between the Gaussian embeddings, in effect the likelihood ratio. The likelihood ratio is specifically tailored to detect equality versus singularity of two Gaussians, and satisfies a ``$0/\infty$" law, in that it vanishes under the null and diverges under the alternative. To implement the test in finite samples, we introduce a regularised version, calibrated by way of permutation. We prove consistency, establish uniform power guarantees under mild conditions, and discuss how our framework unifies and extends prior approaches based on spectrally regularized MMD. Empirical results on synthetic and real data demonstrate remarkable gains in power compared to state-of-the-art methods, particularly in high-dimensional and weak-signal regimes.

View on arXiv PDF

Similar