MLLGSTCOMEMay 12, 2021

Kernel Thinning

arXiv:2105.05842v1152 citations
Originality Highly original
AI Analysis

This addresses the problem of efficient distribution compression for practitioners in machine learning and statistics, offering a novel method with theoretical guarantees and practical benefits in high dimensions.

The paper tackles the problem of compressing distributions more effectively than i.i.d. sampling or standard thinning by introducing kernel thinning, which reduces an n-point approximation to a √n-point one with comparable worst-case integration error, achieving rates like O_d(n^{-1/2}√log n) for compactly supported distributions, compared to Ω(n^{-1/4}) for i.i.d. samples.

We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}_{\star}$ and $O(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is $O_d(n^{-1/2}\sqrt{\log n})$ in probability for compactly supported $\mathbb{P}$ and $O_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $Ω(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. Moreover, the same construction delivers near-optimal $L^\infty$ coresets in $O(n^2)$ time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Matérn, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes