Swish-T : Enhancing Swish Activation with Tanh Bias for Improved Neural Network Performance
This work addresses the need for better activation functions in neural networks, but it is incremental as it builds upon the existing Swish function.
The paper tackled the problem of improving neural network performance by enhancing the Swish activation function with a Tanh bias, resulting in the Swish-T family that achieved superior empirical results across multiple benchmark datasets like MNIST, Fashion MNIST, SVHN, CIFAR-10, and CIFAR-100.
We propose the Swish-T family, an enhancement of the existing non-monotonic activation function Swish. Swish-T is defined by adding a Tanh bias to the original Swish function. This modification creates a family of Swish-T variants, each designed to excel in different tasks, showcasing specific advantages depending on the application context. The Tanh bias allows for broader acceptance of negative values during initial training stages, offering a smoother non-monotonic curve than the original Swish. We ultimately propose the Swish-T$_{\textbf{C}}$ function, while Swish-T and Swish-T$_{\textbf{B}}$, byproducts of Swish-T$_{\textbf{C}}$, also demonstrate satisfactory performance. Furthermore, our ablation study shows that using Swish-T$_{\textbf{C}}$ as a non-parametric function can still achieve high performance. The superiority of the Swish-T family has been empirically demonstrated across various models and benchmark datasets, including MNIST, Fashion MNIST, SVHN, CIFAR-10, and CIFAR-100. The code is publicly available at https://github.com/ictseoyoungmin/Swish-T-pytorch.