MLLGSTJul 7, 2022

Neural Stein critics with staged $L^2$-regularization

arXiv:2207.03406v34 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in statistics and machine learning for high-dimensional data analysis, offering an incremental improvement in training methods for distribution testing.

The paper tackles the problem of distinguishing between data from an unknown distribution and a model distribution in high-dimensional settings by proposing a staged L^2-regularization procedure for training neural Stein critics, achieving a convergence rate of O(n^{-1/2}) up to a log factor and demonstrating benefits on simulated and image data.

Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large, and training on $n$ samples converge at a rate of ${O}(n^{-1/2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes