LGNEMLFeb 10, 2020

On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks

arXiv:2002.04060v126 citations
AI Analysis

This provides the first theoretical justification for using softmax output layers in neural networks for pattern classification, addressing a foundational gap in machine learning theory.

The paper extended universal approximator theory to neural networks with ReLU activation and softmax output layers, proving they can approximate any function in L^1 up to arbitrary precision and any indicator function for multi-class classification.

In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in $L^1$ up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L^1$, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes