Learning Functions: When Is Deep Better Than Shallow
This addresses the theoretical understanding of why deep learning works better than shallow methods for compositional functions, which is foundational for machine learning practitioners and researchers.
The paper proves that deep networks can approximate compositional functions as accurately as shallow networks but with exponentially fewer parameters and lower VC-dimension, settling Bengio's conjecture on depth, and defines a class of algorithms to justify deep convolutional networks.
While the universal approximation property holds both for hierarchical and shallow networks, we prove that deep (hierarchical) networks can approximate the class of compositional functions with the same accuracy as shallow networks but with exponentially lower number of training parameters as well as VC-dimension. This theorem settles an old conjecture by Bengio on the role of depth in networks. We then define a general class of scalable, shift-invariant algorithms to show a simple and natural set of requirements that justify deep convolutional networks.