Static Activation Function Normalization
This work addresses the need for more efficient and robust training in deep learning, offering a method similar to batch normalization but without computational overhead, though it appears incremental as it builds on prior principles.
The paper tackled the problem of improving convergence speed and robustness in deep neural networks by introducing static activation function normalization, which transforms existing activation functions like ReLU to enhance convergence robustness, maximum training depth, and anytime performance, as verified through empirical eigenvalue distributions.
Recent seminal work at the intersection of deep neural networks practice and random matrix theory has linked the convergence speed and robustness of these networks with the combination of random weight initialization and nonlinear activation function in use. Building on those principles, we introduce a process to transform an existing activation function into another one with better properties. We term such transform \emph{static activation normalization}. More specifically we focus on this normalization applied to the ReLU unit, and show empirically that it significantly promotes convergence robustness, maximum training depth, and anytime performance. We verify these claims by examining empirical eigenvalue distributions of networks trained with those activations. Our static activation normalization provides a first step towards giving benefits similar in spirit to schemes like batch normalization, but without computational cost.