LG AIOct 25, 2024

Simmering: Sufficient is better than optimal for training neural networks

Irina Babayan, Hazhir Aliahmadi, Greg van Anders

arXiv:2410.19912v23 citationsh-index: 2Nat Commun

Originality Highly original

AI Analysis

This work challenges the optimization paradigm for neural network training, potentially benefiting researchers and practitioners by offering a new class of sufficient training algorithms that avoid overfitting issues.

The paper tackles the problem of overfitting in neural network training by introducing 'simmering', a physics-based method that trains networks to be merely 'good enough', which paradoxically outperforms leading optimization-based approaches like Adam in classification and regression tasks, correcting overfit models and preventing overfitting when used from the start.

The broad range of neural network training techniques that invoke optimization but rely on ad hoc modification for validity suggests that optimization-based training is misguided. Shortcomings of optimization-based training are brought to particularly strong relief by the problem of overfitting, where naive optimization produces spurious outcomes. The broad success of neural networks for modelling physical processes has prompted advances that are based on inverting the direction of investigation and treating neural networks as if they were physical systems in their own right. These successes raise the question of whether broader, physical perspectives could motivate the construction of improved training algorithms. Here, we introduce simmering, a physics-based method that trains neural networks to generate weights and biases that are merely ``good enough'', but which, paradoxically, outperforms leading optimization-based approaches. Using classification and regression examples we show that simmering corrects neural networks that are overfit by Adam, and show that simmering avoids overfitting if deployed from the outset. Our results question optimization as a paradigm for neural network training, and leverage information-geometric arguments to point to the existence of classes of sufficient training algorithms that do not take optimization as their starting point.

View on arXiv PDF

Similar