MLLGAPCOSep 14, 2017

Two-sample Statistics Based on Anisotropic Kernels

arXiv:1709.05006v319 citations
AI Analysis

This work addresses the problem of comparing distributions in high-dimensional data for fields like biomedical imaging, though it appears incremental as it builds on existing kernel-based methods.

The paper introduces an anisotropic kernel-based Maximum Mean Discrepancy statistic for measuring distances between distributions using finite multivariate samples, proving test consistency and a finite-sample power lower bound, with applications in flow cytometry and diffusion MRI datasets.

The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between $n$ data points and a set of $n_R$ reference points, where $n_R$ can be drastically smaller than $n$. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as $\|p-q\| \sqrt{n} \to \infty $, and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes