TanhSoft -- a family of activation functions combining Tanh and Softplus
This work addresses the need for better activation functions in deep learning, but it is incremental as it builds on existing combinations of tanh and softplus with minor tuning.
The authors tackled the problem of improving deep learning performance by proposing a new family of activation functions called TanhSoft, which combines tanh and softplus with tunable hyper-parameters. The result showed that replacing ReLU with specific TanhSoft variants improved top-1 classification accuracy by up to 0.46% on CIFAR-10 and 2.57% on CIFAR-100 across different models.
Deep learning at its core, contains functions that are composition of a linear transformation with a non-linear function known as activation function. In past few years, there is an increasing interest in construction of novel activation functions resulting in better learning. In this work, we propose a family of novel activation functions, namely TanhSoft, with four undetermined hyper-parameters of the form tanh(αx+βe^{γx})ln(δ+e^x) and tune these hyper-parameters to obtain activation functions which are shown to outperform several well known activation functions. For instance, replacing ReLU with xtanh(0.6e^x)improves top-1 classification accuracy on CIFAR-10 by 0.46% for DenseNet-169 and 0.7% for Inception-v3 while with tanh(0.87x)ln(1 +e^x) top-1 classification accuracy on CIFAR-100 improves by 1.24% for DenseNet-169 and 2.57% for SimpleNet model.