Neural Network with Unbounded Activation Functions is Universal Approximator
This provides theoretical justification for the use of ReLU in deep learning, which is incremental as it builds on existing universal approximation theory.
The paper tackles the problem of proving that neural networks with unbounded activation functions like ReLU maintain universal approximation capabilities, and it shows that under a constructive admissibility condition, such networks can be trained by discretizing the ridgelet transform without backpropagation.
This paper presents an investigation of the approximation property of neural networks with unbounded activation functions, such as the rectified linear unit (ReLU), which is the new de-facto standard of deep learning. The ReLU network can be analyzed by the ridgelet transform with respect to Lizorkin distributions. By showing three reconstruction formulas by using the Fourier slice theorem, the Radon transform, and Parseval's relation, it is shown that a neural network with unbounded activation functions still satisfies the universal approximation property. As an additional consequence, the ridgelet transform, or the backprojection filter in the Radon domain, is what the network learns after backpropagation. Subject to a constructive admissibility condition, the trained network can be obtained by simply discretizing the ridgelet transform, without backpropagation. Numerical examples not only support the consistency of the admissibility condition but also imply that some non-admissible cases result in low-pass filtering.