LGNEOct 13, 2016

Why Deep Neural Networks for Function Approximation?

arXiv:1610.04161v2408 citations
Originality Highly original
AI Analysis

This provides a theoretical justification for the empirical success of deep learning in approximating functions, addressing a foundational question in machine learning.

The paper tackles the problem of why deep neural networks are preferred over shallow ones for function approximation, showing that for piecewise smooth functions, shallow networks require exponentially more neurons than deep networks to achieve a given approximation error, with deep networks needing O(polylog(1/ε)) neurons versus Ω(poly(1/ε)) for shallow networks.

Recently there has been much interest in understanding why deep neural networks are preferred to shallow networks. We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation. First, we consider univariate functions on a bounded interval and require a neural network to achieve an approximation error of $\varepsilon$ uniformly over the interval. We show that shallow networks (i.e., networks whose depth does not depend on $\varepsilon$) require $Ω(\text{poly}(1/\varepsilon))$ neurons while deep networks (i.e., networks whose depth grows with $1/\varepsilon$) require $\mathcal{O}(\text{polylog}(1/\varepsilon))$ neurons. We then extend these results to certain classes of important multivariate functions. Our results are derived for neural networks which use a combination of rectifier linear units (ReLUs) and binary step units, two of the most popular type of activation functions. Our analysis builds on a simple observation: the multiplication of two bits can be represented by a ReLU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes