LGFeb 24, 2022

Activation Functions: Dive into an optimal activation function

arXiv:2202.12065v1
Originality Synthesis-oriented
AI Analysis

This work addresses the choice of activation functions for neural network practitioners, but it is incremental as it builds on existing functions without introducing a fundamentally new approach.

The study tackled the problem of finding an optimal activation function for neural networks by defining it as a weighted sum of existing functions and optimizing the weights during training, resulting in observations that ReLU dominates in initial layers while deeper layers prefer more convergent functions.

Activation functions have come up as one of the essential components of neural networks. The choice of adequate activation function can impact the accuracy of these methods. In this study, we experiment for finding an optimal activation function by defining it as a weighted sum of existing activation functions and then further optimizing these weights while training the network. The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets, MNIST, FashionMNIST, and KMNIST. We observe that the ReLU activation function can easily overlook other activation functions. Also, we see that initial layers prefer to have ReLU or LeakyReLU type of activation functions, but deeper layers tend to prefer more convergent activation functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes