LGApr 23, 2023

The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations

arXiv:2304.11692v42 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses a specific training instability problem for deep learning practitioners, but appears incremental as it builds on known issues with BN and ReLU.

The paper tackles the instability in early training of deep neural networks using batch normalization and ReLU, caused by gradient explosion, and finds that activation correlations mitigate this issue, leading to a proposed adaptive learning rate algorithm for better control.

Deep neural networks, which employ batch normalization and ReLU-like activation functions, suffer from instability in the early stages of training due to the high gradient induced by temporal gradient explosion. In this study, we analyze the occurrence and mitigation of gradient explosion both theoretically and empirically, and discover that the correlation between activations plays a key role in preventing the gradient explosion from persisting throughout the training. Finally, based on our observations, we propose an improved adaptive learning rate algorithm to effectively control the training instability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes