Learning Activation Functions to Improve Deep Neural Networks
This addresses the limitation of static activation functions for deep learning practitioners, offering a novel method rather than an incremental improvement.
The paper tackled the problem of fixed activation functions in neural networks by designing a novel piecewise linear activation function learned per neuron via gradient descent, achieving state-of-the-art performance with error rates of 7.51% on CIFAR-10 and 30.83% on CIFAR-100.
Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.