ML LG STJul 7, 2022

Neural Stein critics with staged $L^2$-regularization

arXiv:2207.03406v35.34 citationsh-index: 27Has Code

Originality Incremental advance

AI Analysis

This work addresses a fundamental challenge in statistics and machine learning for high-dimensional data analysis, offering an incremental improvement in training methods for distribution testing.

The paper tackles the problem of distinguishing between data from an unknown distribution and a model distribution in high-dimensional settings by proposing a staged L^2-regularization procedure for training neural Stein critics, achieving a convergence rate of O(n^{-1/2}) up to a log factor and demonstrating benefits on simulated and image data.

Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large, and training on $n$ samples converge at a rate of ${O}(n^{-1/2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.

View on arXiv PDF Code

Similar