LGAug 9, 2022

On the Activation Function Dependence of the Spectral Bias of Neural Networks

Qingguo Hong, Jonathan W. Siegel, Qinyang Tan, Jinchao Xu

arXiv:2208.04924v320.842 citationsh-index: 66

Originality Incremental advance

AI Analysis

This addresses the spectral bias problem for neural network practitioners, offering a novel activation function that enhances training efficiency and accuracy, though it is incremental as it builds on prior work on activation functions.

The paper tackled the spectral bias of neural networks by theoretically explaining it for ReLU networks and predicting that switching to a Hat activation function removes this bias, which was verified empirically with faster training and improved generalization in various settings.

Neural networks are universal function approximators which are known to generalize well despite being dramatically overparameterized. We study this phenomenon from the point of view of the spectral bias of neural networks. Our contributions are two-fold. First, we provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods. Second, based upon this theory we predict that switching the activation function to a piecewise linear B-spline, namely the Hat function, will remove this spectral bias, which we verify empirically in a variety of settings. Our empirical studies also show that neural networks with the Hat activation function are trained significantly faster using stochastic gradient descent and ADAM. Combined with previous work showing that the Hat activation function also improves generalization accuracy on image classification tasks, this indicates that using the Hat activation provides significant advantages over the ReLU on certain problems.

View on arXiv PDF

Similar