NELGMay 22, 2018

ARiA: Utilizing Richard's Curve for Controlling the Non-monotonicity of the Activation Function in Deep Neural Nets

arXiv:1805.08878v18 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better activation functions in deep learning, offering a potential replacement for widely used units like ReLU, though it appears incremental as it builds on existing non-monotonic concepts like Swish.

The authors tackled the problem of improving activation functions in deep neural networks by introducing ARiA, a non-monotonic activation based on Richard's Curve, which outperformed ReLU and Swish on MNIST, CIFAR-10, and CIFAR-100 datasets with significantly superior results.

This work introduces a novel activation unit that can be efficiently employed in deep neural nets (DNNs) and performs significantly better than the traditional Rectified Linear Units (ReLU). The function developed is a two parameter version of the specialized Richard's Curve and we call it Adaptive Richard's Curve weighted Activation (ARiA). This function is non-monotonous, analogous to the newly introduced Swish, however allows a precise control over its non-monotonous convexity by varying the hyper-parameters. We first demonstrate the mathematical significance of the two parameter ARiA followed by its application to benchmark problems such as MNIST, CIFAR-10 and CIFAR-100, where we compare the performance with ReLU and Swish units. Our results illustrate a significantly superior performance on all these datasets, making ARiA a potential replacement for ReLU and other activations in DNNs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes