Efficient Search for Customized Activation Functions with Gradient Descent
This provides a practical, automated solution for enhancing deep learning architectures, applicable to tasks like image classification and language modeling, though it is incremental as it builds on existing gradient-based search techniques.
The paper tackles the problem of finding optimal activation functions for different deep learning models by using gradient-based search to efficiently identify high-performing, customized activations, resulting in improved performance across various models and datasets with orders of magnitude greater efficiency than prior methods.
Different activation functions work best for different deep learning models. To exploit this, we leverage recent advancements in gradient-based search techniques for neural architectures to efficiently identify high-performing activation functions for a given application. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions, allowing for the exploration of novel activations. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried, from image classification to language models. Moreover, the identified activations exhibit strong transferability to larger models of the same type, as well as new datasets. Importantly, our automated process for creating customized activation functions is orders of magnitude more efficient than previous approaches. It can easily be applied on top of arbitrary deep learning pipelines and thus offers a promising practical avenue for enhancing deep learning architectures.