ML LG ST MEJan 14, 2023

Compress Then Test: Powerful Kernel Testing in Near-linear Time

Carles Domingo-Enrich, Raaz Dwivedi, Lester Mackey

HarvardMIT

arXiv:2301.05974v316.216 citationsh-index: 40Has Code

Originality Highly original

AI Analysis

This work addresses the computational bottleneck in kernel testing for statisticians and machine learning practitioners, offering a significant speed improvement while preserving statistical power.

The authors tackled the problem of slow kernel two-sample testing by introducing Compress Then Test (CTT), a framework that compresses samples into coresets to achieve near-linear runtime while maintaining optimal detection power, providing 20-200x speed-ups over existing methods with no loss of power.

Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address these shortcomings, we introduce Compress Then Test (CTT), a new framework for high-powered kernel testing based on sample compression. CTT cheaply approximates an expensive test by compressing each $n$ point sample into a small but provably high-fidelity coreset. For standard kernels and subexponential distributions, CTT inherits the statistical behavior of a quadratic-time test -- recovering the same optimal detection boundary -- while running in near-linear time. We couple these advances with cheaper permutation testing, justified by new power analyses; improved time-vs.-quality guarantees for low-rank approximation; and a fast aggregation procedure for identifying especially discriminating kernels. In our experiments with real and simulated data, CTT and its extensions provide 20--200x speed-ups over state-of-the-art approximate MMD tests with no loss of power.

View on arXiv PDF Code

Similar