SwishReLU: A Unified Approach to Activation Functions for Enhanced Deep Neural Networks Performance
This work addresses activation function inefficiencies for deep learning practitioners, but it is incremental as it builds on existing variants like ReLU and Swish.
The paper tackles the problem of 'Dying ReLU' in deep neural networks by proposing SwishReLU, a novel activation function that combines elements of ReLU and Swish, resulting in a 6% accuracy improvement on CIFAR-10 with VGG16 and lower computational cost than Swish.
ReLU, a commonly used activation function in deep neural networks, is prone to the issue of "Dying ReLU". Several enhanced versions, such as ELU, SeLU, and Swish, have been introduced and are considered to be less commonly utilized. However, replacing ReLU can be somewhat challenging due to its inconsistent advantages. While Swish offers a smoother transition similar to ReLU, its utilization generally incurs a greater computational burden compared to ReLU. This paper proposes SwishReLU, a novel activation function combining elements of ReLU and Swish. Our findings reveal that SwishReLU outperforms ReLU in performance with a lower computational cost than Swish. This paper undertakes an examination and comparison of different types of ReLU variants with SwishReLU. Specifically, we compare ELU and SeLU along with Tanh on three datasets: CIFAR-10, CIFAR-100 and MNIST. Notably, applying SwishReLU in the VGG16 model described in Algorithm 2 yields a 6% accuracy improvement on the CIFAR-10 dataset.