On the number of response regions of deep feed forward networks with piece-wise linear activations
This provides a theoretical foundation for understanding why deep networks outperform shallow ones in machine learning, though it is incremental as it builds on existing computational geometry frameworks.
The paper tackles the problem of comparing the representational power of deep versus shallow neural networks with piecewise linear activations by analyzing the number of linear regions they can produce. It shows that deep models have significantly more linear regions than shallow ones, with asymptotic bounds like Ω(⌊n/n₀⌋^{k-1}n^{n₀}) for deep networks versus O(k^{n₀}n^{n₀}) for shallow ones, indicating superior complexity.
This paper explores the complexity of deep feedforward networks with linear pre-synaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piecewise linear functions based on computational geometry. We look at a deep rectifier multi-layer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime, when the number of inputs stays constant, if the shallow model has $kn$ hidden units and $n_0$ inputs, then the number of linear regions is $O(k^{n_0}n^{n_0})$. For a $k$ layer model with $n$ hidden units on each layer it is $Ω(\left\lfloor {n}/{n_0}\right\rfloor^{k-1}n^{n_0})$. The number $\left\lfloor{n}/{n_0}\right\rfloor^{k-1}$ grows faster than $k^{n_0}$ when $n$ tends to infinity or when $k$ tends to infinity and $n \geq 2n_0$. Additionally, even when $k$ is small, if we restrict $n$ to be $2n_0$, we can show that a deep model has considerably more linear regions that a shallow one. We consider this as a first step towards understanding the complexity of these models and specifically towards providing suitable mathematical tools for future analysis.