Neural networks with trainable matrix activation functions
This work addresses a limitation in neural network training for researchers and practitioners by making activation functions trainable, though it is incremental as it builds on existing ReLU-based methods.
The paper tackles the problem of fixed activation functions in neural networks by developing trainable matrix activation functions derived from ReLU, which are optimized alongside network weights. Numerical experiments demonstrate that networks using these activations are robust and efficient.
The training process of neural networks usually optimize weights and bias parameters of linear transformations, while nonlinear activation functions are pre-specified and fixed. This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from ReLU. The activation is based on matrix-vector multiplications using only scalar multiplications and comparisons. The proposed activation functions depend on parameters that are trained along with the weights and bias vectors. Neural networks based on this approach are simple and efficient and are shown to be robust in numerical experiments.