MLNov 4, 2025
Precise asymptotic analysis of Sobolev training for random feature modelsKatharine E Fisher, Matthew TC Li, Youssef Marzouk et al.
Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training -- regression with both function and gradient data -- on the generalization error of highly overparameterized predictive models in high dimensions. In this paper, we obtain a precise characterization of this training modality for random feature (RF) models in the limit where the number of trainable parameters, input dimensions, and training data tend proportionally to infinity. Our model for Sobolev training reflects practical implementations by sketching gradient data onto finite dimensional subspaces. By combining the replica method from statistical physics with linearizations in operator-valued free probability theory, we derive a closed-form description for the generalization errors of the trained RF models. For target functions described by single-index models, we demonstrate that supplementing function data with additional gradient data does not universally improve predictive performance. Rather, the degree of overparameterization should inform the choice of training method. More broadly, our results identify settings where models perform optimally by interpolating noisy function and gradient data.
51.1COMar 13
Scalability of the second-order reliability method for stochastic differential equations with multiplicative noiseTimo Schorlepp, Tobias Grafke
We show how to efficiently compute asymptotically sharp estimates of extreme event probabilities in stochastic differential equations (SDEs) with small multiplicative Brownian noise. The underlying approximation is known as sharp large deviation theory or precise Laplace asymptotics in mathematics, the second-order reliability method (SORM) in reliability engineering, and the instanton or optimal fluctuation method with 1-loop corrections in physics. It is based on approximating the tail probability in question with the most probable realization of the stochastic process, and local perturbations around this realization. We first recall and contextualize the relevant classical theoretical result on precise Laplace asymptotics of diffusion processes [Ben Arous (1988), Stochastics, 25(3), 125-153], and then show how to compute the involved infinite-dimensional quantities - operator traces and Carleman-Fredholm determinants - numerically in a way that is scalable with respect to the time discretization and remains feasible in high spatial dimensions. Using tools from automatic differentiation, we achieve a straightforward black-box numerical computation of the SORM estimates in JAX. The method is illustrated in examples of SDEs and stochastic partial differential equations, including a two-dimensional random advection-diffusion model of a passive scalar. We thereby demonstrate that it is possible to obtain efficient and accurate SORM estimates for very high-dimensional problems, as long as the infinite-dimensional structure of the problem is correctly taken into account. Our JAX implementation of the method is made publicly available.