Variations on the Chebyshev-Lagrange Activation Function
This work addresses data efficiency for neural network practitioners, offering an incremental improvement over existing activation functions like ReLU or tanh.
The paper tackles the problem of data efficiency in neural networks by introducing parameterized piece-wise polynomial activation functions based on Chebyshev nodes and Lagrangian interpolation, achieving competitive or state-of-the-art performance on image classification and vector datasets with significant improvements in expression capacity and interpolation accuracy.
We seek to improve the data efficiency of neural networks and present novel implementations of parameterized piece-wise polynomial activation functions. The parameters are the y-coordinates of n+1 Chebyshev nodes per hidden unit and Lagrangian interpolation between the nodes produces the polynomial on [-1, 1]. We show results for different methods of handling inputs outside [-1, 1] on synthetic datasets, finding significant improvements in capacity of expression and accuracy of interpolation in models that compute some form of linear extrapolation from either ends. We demonstrate competitive or state-of-the-art performance on the classification of images (MNIST and CIFAR-10) and minimally-correlated vectors (DementiaBank) when we replace ReLU or tanh with linearly extrapolated Chebyshev-Lagrange activations in deep residual architectures.