Backward Gradient Normalization in Deep Neural Networks
This addresses a known bottleneck in training deep neural networks, but the approach appears incremental as it modifies existing backpropagation methods rather than introducing a new paradigm.
The paper tackles the problem of vanishing or exploding gradients in very deep neural networks by introducing a new gradient normalization technique during the backward pass, which improves network accuracy in several experimental conditions.
We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.