Improving neural networks with bunches of neurons modeled by Kumaraswamy units: Preliminary study
This is an incremental improvement for neural network practitioners, focusing on activation functions in shallow networks.
The paper tackled improving neural networks by introducing a new activation function called the Kumaraswamy unit, based on the generalized Kumaraswamy distribution, and reported a significant drop in test classification error and cross-entropy on MNIST compared to sigmoid, ReLU, and Noisy ReLU.
Deep neural networks have recently achieved state-of-the-art results in many machine learning problems, e.g., speech recognition or object recognition. Hitherto, work on rectified linear units (ReLU) provides empirical and theoretical evidence on performance increase of neural networks comparing to typically used sigmoid activation function. In this paper, we investigate a new manner of improving neural networks by introducing a bunch of copies of the same neuron modeled by the generalized Kumaraswamy distribution. As a result, we propose novel non-linear activation function which we refer to as Kumaraswamy unit which is closely related to ReLU. In the experimental study with MNIST image corpora we evaluate the Kumaraswamy unit applied to single-layer (shallow) neural network and report a significant drop in test classification error and test cross-entropy in comparison to sigmoid unit, ReLU and Noisy ReLU.