LGMLJun 16, 2020

Minimum Width for Universal Approximation

arXiv:2006.08859v1155 citations
Originality Highly original
AI Analysis

This provides a foundational theoretical characterization for neural network design, addressing a key gap in understanding width requirements for universal approximation in machine learning.

The paper tackles the problem of determining the minimum width needed for neural networks with ReLU activation to universally approximate functions, proving that the exact minimum width is max{d_x+1, d_y} for L^p approximation, and shows this does not hold for uniform approximation with ReLU alone but does with an added threshold function.

The universal approximation property of width-bounded networks has been studied as a dual of classical universal approximation results on depth-bounded networks. However, the critical width enabling the universal approximation has not been exactly characterized in terms of the input dimension $d_x$ and the output dimension $d_y$. In this work, we provide the first definitive result in this direction for networks using the ReLU activation functions: The minimum width required for the universal approximation of the $L^p$ functions is exactly $\max\{d_x+1,d_y\}$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function. Our proof technique can be also used to derive a tighter upper bound on the minimum width required for the universal approximation using networks with general activation functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes