LGMLSep 19, 2023

Minimum width for universal approximation using ReLU networks on compact domain

arXiv:2309.10402v221 citationsh-index: 3
Originality Incremental advance
AI Analysis

This provides theoretical insights into neural network architecture design, particularly for practitioners working with compact data domains, though it is incremental in refining existing bounds.

The paper determines the exact minimum width for universal approximation using ReLU-like networks on compact domains, showing it is max{d_x, d_y, 2} for L^p approximation, which is smaller than the known result for unbounded domains, and establishes a lower bound for uniform approximation, revealing a dichotomy between L^p and uniform approximations.

It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for $L^p$ approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\smash{\mathbb R^{d_x}}$, our result first shows that approximation on a compact domain requires smaller width than on $\smash{\mathbb R^{d_x}}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our first result, this shows a dichotomy between $L^p$ and uniform approximations for general activation functions and input/output dimensions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes