LGApr 10, 2025

Minimum width for universal approximation using squashable activation functions

arXiv:2504.07371v13 citationsh-index: 3ICML
Originality Highly original
AI Analysis

This work provides theoretical bounds on network width for universal approximation, addressing a foundational issue in neural network design for researchers and practitioners in machine learning.

The paper tackles the problem of determining the minimum width required for neural networks with general activation functions to achieve universal approximation, showing that for squashable activation functions, the minimum width is max{d_x, d_y, 2} for approximating L^p functions, with exceptions for monotone functions in the case d_x = d_y = 1.

The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\{d_x,d_y,2\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes