NELGMLMar 8, 2024

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

arXiv:2403.13833v11 citationsh-index: 7ECML/PKDD
Originality Incremental advance
AI Analysis

This addresses the vanishing gradient problem for faster and more stable training of neural networks, though it appears incremental as it builds on existing normalization techniques.

The paper tackles the problem of activation shift in neural networks, which causes non-zero mean preactivations and contributes to vanishing gradients, by proposing linearly constrained weights (LCW) to reduce this shift. Experimental results show that LCW enables efficient training of deep sigmoid networks by resolving vanishing gradients and, when combined with batch normalization, improves generalization in feedforward and convolutional networks.

In this paper, we first identify activation shift, a simple but remarkable phenomenon in a neural network in which the preactivation value of a neuron has non-zero mean that depends on the angle between the weight vector of the neuron and the mean of the activation vector in the previous layer. We then propose linearly constrained weights (LCW) to reduce the activation shift in both fully connected and convolutional layers. The impact of reducing the activation shift in a neural network is studied from the perspective of how the variance of variables in the network changes through layer operations in both forward and backward chains. We also discuss its relationship to the vanishing gradient problem. Experimental results show that LCW enables a deep feedforward network with sigmoid activation functions to be trained efficiently by resolving the vanishing gradient problem. Moreover, combined with batch normalization, LCW improves generalization performance of both feedforward and convolutional networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes