LGCCMLFeb 27, 2017

Depth Separation for Neural Networks

arXiv:1702.08489v180 citations
Originality Highly original
AI Analysis

This provides a theoretical separation result for neural network depth, addressing a foundational problem in understanding the expressivity of deep learning architectures.

The paper tackles the problem of approximating functions of the form f(x,x') = g(<x,x'>) with neural networks, showing that depth two networks with bounded weights cannot approximate such functions when g is not approximated by low-degree polynomials, requiring exponentially many neurons for cases like g(x) = sin(πd^3x). It establishes a separation between depth two and depth three networks, as depth three networks can approximate these functions with polynomial size and bounded weights.

Let $f:\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}\to\mathbb{S}$ be a function of the form $f(\mathbf{x},\mathbf{x}') = g(\langle\mathbf{x},\mathbf{x}'\rangle)$ for $g:[-1,1]\to \mathbb{R}$. We give a simple proof that shows that poly-size depth two neural networks with (exponentially) bounded weights cannot approximate $f$ whenever $g$ cannot be approximated by a low degree polynomial. Moreover, for many $g$'s, such as $g(x)=\sin(πd^3x)$, the number of neurons must be $2^{Ω\left(d\log(d)\right)}$. Furthermore, the result holds w.r.t.\ the uniform distribution on $\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}$. As many functions of the above form can be well approximated by poly-size depth three networks with poly-bounded weights, this establishes a separation between depth two and depth three networks w.r.t.\ the uniform distribution on $\mathbb{S}^{d-1}\times \mathbb{S}^{d-1}$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes