LGMLOct 30, 2017

Empirical analysis of non-linear activation functions for Deep Neural Networks in classification tasks

arXiv:1710.11272v116 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the choice of activation functions for practitioners in machine learning, but it is incremental as it builds on existing methods without introducing new paradigms.

The paper tackled the problem of selecting effective non-linear activation functions for deep neural networks in classification tasks, achieving impressive accuracy results on the MNIST dataset through empirical analysis and architectural optimization.

We provide an overview of several non-linear activation functions in a neural network architecture that have proven successful in many machine learning applications. We conduct an empirical analysis on the effectiveness of using these function on the MNIST classification task, with the aim of clarifying which functions produce the best results overall. Based on this first set of results, we examine the effects of building deeper architectures with an increasing number of hidden layers. We also survey the impact of using, on the same task, different initialisation schemes for the weights of our neural network. Using these sets of experiments as a base, we conclude by providing a optimal neural network architecture that yields impressive results in accuracy on the MNIST classification task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes