LG NEJan 13, 2023

Efficient Activation Function Optimization through Surrogate Modeling

arXiv:2301.05785v68.810 citationsh-index: 65Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of designing efficient activation functions for machine learning practitioners, offering a practical and theoretical foundation for optimization, though it is incremental in improving existing search methods.

The paper tackled the problem of optimizing activation functions for neural networks, which is difficult for humans and computationally expensive for algorithms, by creating benchmark datasets, developing a surrogate-based optimization method using Fisher information and output distribution characteristics, and discovering a sigmoidal activation function that outperformed others in real-world tasks, challenging the use of rectifier nonlinearities.

Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in several real-world tasks, with a surprising finding: a sigmoidal design that outperformed all other activation functions was discovered, challenging the status quo of always using rectifier nonlinearities in deep learning. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization.

View on arXiv PDF Code

Similar