The power of deeper networks for expressing natural functions
This provides theoretical insights into neural network design for researchers and practitioners, though it is incremental as it builds on known universality results.
The paper tackles the problem of understanding why deeper neural networks are more powerful than shallow ones by proving that for approximating multivariate polynomials, deep networks require neurons growing linearly with the number of variables, while shallow networks require exponential growth, and suggests that practical expressibility needs layers growing only logarithmically.
It is well-known that neural networks are universal approximators, but that deeper networks tend in practice to be more powerful than shallower ones. We shed light on this by proving that the total number of neurons $m$ required to approximate natural classes of multivariate polynomials of $n$ variables grows only linearly with $n$ for deep neural networks, but grows exponentially when merely a single hidden layer is allowed. We also provide evidence that when the number of hidden layers is increased from $1$ to $k$, the neuron requirement grows exponentially not with $n$ but with $n^{1/k}$, suggesting that the minimum number of layers required for practical expressibility grows only logarithmically with $n$.