Computational Separation Between Convolutional and Fully-Connected Networks
This work addresses a foundational problem in machine learning theory for researchers and practitioners, providing a computational separation that explains the empirical success of CNNs in vision tasks.
The paper tackles the theoretical gap in understanding why convolutional neural networks outperform fully-connected networks by demonstrating a class of problems that convolutional networks can solve efficiently with gradient-descent, while fully-connected networks of polynomial size struggle to learn them.
Convolutional neural networks (CNN) exhibit unmatched performance in a multitude of computer vision tasks. However, the advantage of using convolutional networks over fully-connected networks is not understood from a theoretical perspective. In this work, we show how convolutional networks can leverage locality in the data, and thus achieve a computational advantage over fully-connected networks. Specifically, we show a class of problems that can be efficiently solved using convolutional networks trained with gradient-descent, but at the same time is hard to learn using a polynomial-size fully-connected network.