AI CV LG NEMay 7, 2024

A Significantly Better Class of Activation Functions Than ReLU Like Activation Functions

arXiv:2405.04459v12.34 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the need for more efficient and effective activation functions in deep learning, offering a novel alternative to ReLU-like functions, though it appears incremental as it builds on existing activation function concepts.

The paper tackles the problem of improving activation functions in neural networks by introducing Cone and Parabolic-Cone functions, which achieve higher accuracies with significantly fewer neurons on CIFAR-10 and Imagenette benchmarks, and speed up training due to larger derivatives.

This paper introduces a significantly better class of activation functions than the almost universally used ReLU like and Sigmoidal class of activation functions. Two new activation functions referred to as the Cone and Parabolic-Cone that differ drastically from popular activation functions and significantly outperform these on the CIFAR-10 and Imagenette benchmmarks are proposed. The cone activation functions are positive only on a finite interval and are strictly negative except at the end-points of the interval, where they become zero. Thus the set of inputs that produce a positive output for a neuron with cone activation functions is a hyperstrip and not a half-space as is the usual case. Since a hyper strip is the region between two parallel hyper-planes, it allows neurons to more finely divide the input feature space into positive and negative classes than with infinitely wide half-spaces. In particular the XOR function can be learn by a single neuron with cone-like activation functions. Both the cone and parabolic-cone activation functions are shown to achieve higher accuracies with significantly fewer neurons on benchmarks. The results presented in this paper indicate that many nonlinear real-world datasets may be separated with fewer hyperstrips than half-spaces. The Cone and Parabolic-Cone activation functions have larger derivatives than ReLU and are shown to significantly speedup training.

View on arXiv PDF

Similar