LGJun 2, 2017

Weight Sharing is Crucial to Succesful Optimization

arXiv:1706.00687v132 citations
Originality Incremental advance
AI Analysis

This work addresses the optimization problem for deep learning practitioners by providing theoretical justification for weight sharing, though it is incremental as it builds on existing knowledge about architectures.

The paper tackles the optimization challenge in training deep neural networks by proving that weight sharing is crucial for exploiting low-frequency components in target functions, with theoretical results supported by empirical experiments.

Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them. While current theoretical work provides, mostly, results showing the hardness of this task, empirical evidence usually differs from this line, with success stories in abundance. A strong position among empirically successful architectures is captured by networks where extensive weight sharing is used, either by Convolutional or Recurrent layers. Additionally, characterizing specific aspects of different tasks, making them "harder" or "easier", is an interesting direction explored both theoretically and empirically. We consider a family of ConvNet architectures, and prove that weight sharing can be crucial, from an optimization point of view. We explore different notions of the frequency, of the target function, proving necessity of the target function having some low frequency components. This necessity is not sufficient - only with weight sharing can it be exploited, thus theoretically separating architectures using it, from others which do not. Our theoretical results are aligned with empirical experiments in an even more general setting, suggesting viability of examination of the role played by interleaving those aspects in broader families of tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes