ML LG MESep 5, 2023

Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test

Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani

arXiv:2309.02422v44.31 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work addresses the challenge of two-sample testing in machine learning and statistics, offering a novel method that connects integral probability metrics with neural networks, though it is incremental in extending classical tests.

The paper tackles the problem of nonparametric two-sample testing by introducing the Radon-Kolmogorov-Smirnov (RKS) test, a generalization of the Kolmogorov-Smirnov test to multiple dimensions and higher smoothness, and shows it has asymptotically full power for distinguishing distinct distributions while leveraging neural networks for optimization.

Integral probability metrics (IPMs) constitute a general class of nonparametric two-sample tests that are based on maximizing the mean difference between samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the IPM defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness degree $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. We can thus leverage the power of modern neural network optimization toolkits to (approximately) maximize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test.

View on arXiv PDF

Similar