Dying ReLU and Initialization: Theory and Numerical Examples
This addresses a fundamental problem in deep learning for researchers and practitioners by providing a theoretical analysis and solution to neuron death, though it is incremental as it builds on existing initialization methods.
The paper tackles the dying ReLU problem in deep neural networks, where neurons become inactive, by proving that deep ReLU networks eventually die with increasing depth and proposing a randomized asymmetric initialization method that effectively prevents this issue.
The dying ReLU refers to the problem when ReLU neurons become inactive and only output 0 for any input. There are many empirical and heuristic explanations of why ReLU neurons die. However, little is known about its theoretical analysis. In this paper, we rigorously prove that a deep ReLU network will eventually die in probability as the depth goes to infinite. Several methods have been proposed to alleviate the dying ReLU. Perhaps, one of the simplest treatments is to modify the initialization procedure. One common way of initializing weights and biases uses symmetric probability distributions, which suffers from the dying ReLU. We thus propose a new initialization procedure, namely, a randomized asymmetric initialization. We prove that the new initialization can effectively prevent the dying ReLU. All parameters required for the new initialization are theoretically designed. Numerical examples are provided to demonstrate the effectiveness of the new initialization procedure.