LGCAMLMay 21, 2019

Universal Approximation with Deep Narrow Networks

arXiv:1905.08539v2425 citations
Originality Highly original
AI Analysis

This work addresses a foundational theoretical problem in machine learning by establishing universal approximation properties for deep narrow networks, which is incremental as it extends classical theorems to a dual scenario.

The paper tackles the problem of universal approximation for neural networks with bounded width and arbitrary depth, showing that networks of width n+m+2 with any nonaffine continuous activation function are dense in continuous functions on compact sets, covering all practical activation functions and including polynomial ones. This result provides a qualitative difference from classical wide shallow networks and extends to cases like nowhere differentiable functions and noncompact domains.

The classical Universal Approximation Theorem holds for neural networks of arbitrary width and bounded depth. Here we consider the natural `dual' scenario for networks of bounded width and arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $ρ$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and activation function $ρ$, is dense in $C(K; \mathbb{R}^m)$ for $K \subseteq \mathbb{R}^n$ with $K$ compact. This covers every activation function possible to use in practice, and also includes polynomial activation functions, which is unlike the classical version of the theorem, and provides a qualitative difference between deep narrow networks and shallow wide networks. We then consider several extensions of this result. In particular we consider nowhere differentiable activation functions, density in noncompact domains with respect to the $L^p$-norm, and how the width may be reduced to just $n + m + 1$ for `most' activation functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes