SmartMixed: A Two-Phase Training Strategy for Adaptive Activation Function Learning in Neural Networks
This addresses the need for adaptive activation functions to improve neural network performance, but it is incremental as it builds on existing activation functions and focuses on a specific dataset.
The paper tackled the problem of fixed activation functions in neural networks by introducing SmartMixed, a two-phase training strategy that learns per-neuron activation functions, resulting in distinct preferences across layers on MNIST with feedforward networks.
The choice of activation function plays a critical role in neural networks, yet most architectures still rely on fixed, uniform activation functions across all neurons. We introduce SmartMixed, a two-phase training strategy that allows networks to learn optimal per-neuron activation functions while preserving computational efficiency at inference. In the first phase, neurons adaptively select from a pool of candidate activation functions (ReLU, Sigmoid, Tanh, Leaky ReLU, ELU, SELU) using a differentiable hard-mixture mechanism. In the second phase, each neuron's activation function is fixed according to the learned selection, resulting in a computationally efficient network that supports continued training with optimized vectorized operations. We evaluate SmartMixed on the MNIST dataset using feedforward neural networks of varying depths. The analysis shows that neurons in different layers exhibit distinct preferences for activation functions, providing insights into the functional diversity within neural architectures.