LGAISep 24, 2025

Sobolev acceleration for neural networks

arXiv:2509.19773v13 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses a theoretical gap for researchers in optimization and deep learning, though it is incremental as it builds on existing Sobolev training methods.

The paper tackles the problem of understanding why Sobolev training (which uses target derivatives in loss functions) accelerates convergence and improves generalization in neural networks, and provides the first rigorous theoretical proof that it accelerates convergence for ReLU networks under specific conditions, with numerical experiments showing benefits in deep learning tasks.

Sobolev training, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain only partially understood. In this work, we present the first rigorous theoretical framework proving that Sobolev training accelerates the convergence of Rectified Linear Unit (ReLU) networks. Under a student-teacher framework with Gaussian inputs and shallow architectures, we derive exact formulas for population gradients and Hessians, and quantify the improvements in conditioning of the loss landscape and gradient-flow convergence rates. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes