LG MLMay 19, 2022

Neural Network Architecture Beyond Width and Depth

arXiv:2205.09459v416.924 citationsh-index: 30

Originality Highly original

AI Analysis

This work addresses the fundamental limitation of neural network expressiveness for researchers and practitioners in machine learning, offering a novel architectural paradigm rather than an incremental improvement.

The paper tackles the problem of limited expressiveness in standard neural networks by introducing a new three-dimensional architecture with an additional 'height' dimension, called NestNet, which achieves significantly better approximation error rates, e.g., O(n^{-(s+1)/d}) vs. O(n^{-2}/d) for standard networks with O(n) parameters.

This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyper-parameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures (those with only width and depth as hyper-parameters), e.g., standard fully connected networks. The new network architecture is constructed recursively via a nested structure, and hence we call a network with the new architecture nested network (NestNet). A NestNet of height $s$ is built with each hidden neuron activated by a NestNet of height $\le s-1$. When $s=1$, a NestNet degenerates to a standard network with a two-dimensional architecture. It is proved by construction that height-$s$ ReLU NestNets with $\mathcal{O}(n)$ parameters can approximate $1$-Lipschitz continuous functions on $[0,1]^d$ with an error $\mathcal{O}(n^{-(s+1)/d})$, while the optimal approximation error of standard ReLU networks with $\mathcal{O}(n)$ parameters is $\mathcal{O}(n^{-2/d})$. Furthermore, such a result is extended to generic continuous functions on $[0,1]^d$ with the approximation error characterized by the modulus of continuity. Finally, we use numerical experimentation to show the advantages of the super-approximation power of ReLU NestNets.

View on arXiv PDF

Similar