LGMLOct 3, 2020

Computational Separation Between Convolutional and Fully-Connected Networks

arXiv:2010.01369v132 citations
Originality Highly original
AI Analysis

This work addresses a foundational problem in machine learning theory for researchers and practitioners, providing a computational separation that explains the empirical success of CNNs in vision tasks.

The paper tackles the theoretical gap in understanding why convolutional neural networks outperform fully-connected networks by demonstrating a class of problems that convolutional networks can solve efficiently with gradient-descent, while fully-connected networks of polynomial size struggle to learn them.

Convolutional neural networks (CNN) exhibit unmatched performance in a multitude of computer vision tasks. However, the advantage of using convolutional networks over fully-connected networks is not understood from a theoretical perspective. In this work, we show how convolutional networks can leverage locality in the data, and thus achieve a computational advantage over fully-connected networks. Specifically, we show a class of problems that can be efficiently solved using convolutional networks trained with gradient-descent, but at the same time is hard to learn using a polynomial-size fully-connected network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes