A theoretical framework for overfitting in energy-based modeling
This work addresses overfitting in energy-based generative models, offering theoretical insights and strategies for improved training, but it is incremental as it builds on existing frameworks like random matrix theory and neural tangent kernels.
The paper tackles overfitting in energy-based models for inverse problems by analyzing training dynamics and finite data effects, showing that optimal early stopping points arise from spectral properties and providing corrections to control overfitting, with extensions to binary-variable models and neural tangent kernel dynamics.
We investigate the impact of limited data on training pairwise energy-based models for inverse problems aimed at identifying interaction networks. Utilizing the Gaussian model as testbed, we dissect training trajectories across the eigenbasis of the coupling matrix, exploiting the independent evolution of eigenmodes and revealing that the learning timescales are tied to the spectral decomposition of the empirical covariance matrix. We see that optimal points for early stopping arise from the interplay between these timescales and the initial conditions of training. Moreover, we show that finite data corrections can be accurately modeled through asymptotic random matrix theory calculations and provide the counterpart of generalized cross-validation in the energy based model context. Our analytical framework extends to binary-variable maximum-entropy pairwise models with minimal variations. These findings offer strategies to control overfitting in discrete-variable models through empirical shrinkage corrections, improving the management of overfitting in energy-based generative models. Finally, we propose a generalization to arbitrary energy-based models by deriving the neural tangent kernel dynamics of the score function under the score-matching algorithm.