LG NAAug 22, 2024

Geometrical structures of digital fluctuations in parameter space of neural networks trained with adaptive momentum optimization

arXiv:2408.12273v11 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses a stability issue in widely used optimization methods for machine learning practitioners, but it is incremental as it builds on known problems without proposing a new solution.

The paper investigates numerical instability in neural networks trained with adaptive momentum optimization, showing that artifacts leading to divergence occur even in shallow networks, as demonstrated by experiments with over 1600 networks trained for 50,000 epochs.

We present results of numerical experiments for neural networks with stochastic gradient-based optimization with adaptive momentum. This widely applied optimization has proved convergence and practical efficiency, but for long-run training becomes numerically unstable. We show that numerical artifacts are observable not only for large-scale models and finally lead to divergence also for case of shallow narrow networks. We argue this theory by experiments with more than 1600 neural networks trained for 50000 epochs. Local observations show presence of the same behavior of network parameters in both stable and unstable training segments. Geometrical behavior of parameters forms double twisted spirals in the parameter space and is caused by alternating of numerical perturbations with next relaxation oscillations in values for 1st and 2nd momentum.

View on arXiv PDF

Similar