Noise-induced degeneration in online learning
This addresses the problem of training slowdown in neural networks for researchers, though it is incremental as it builds on known plateau phenomena.
The paper analyzes how noise in stochastic gradient descent causes plateau phenomena in a minimal multi-layer perceptron model, showing that attracting regions exist in degenerated subspaces, a strong plateau emerges as noise-induced synchronization, and an optimal fluctuation minimizes escape time.
In order to elucidate the plateau phenomena caused by vanishing gradient, we herein analyse stability of stochastic gradient descent near degenerated subspaces in a multi-layer perceptron. In stochastic gradient descent for Fukumizu-Amari model, which is the minimal multi-layer perceptron showing non-trivial plateau phenomena, we show that (1) attracting regions exist in multiply degenerated subspaces, (2) a strong plateau phenomenon emerges as a noise-induced synchronisation, which is not observed in deterministic gradient descent, (3) an optimal fluctuation exists to minimise the escape time from the degenerated subspace. The noise-induced degeneration observed herein is expected to be found in a broad class of machine learning via neural networks.