NELGMLFeb 28, 2017

The Shattered Gradients Problem: If resnets are the answer, then what is the question?

arXiv:1702.08591v2456 citations
AI Analysis

This addresses a fundamental issue in deep learning for researchers and practitioners by explaining why skip-connections improve training and offering an alternative solution.

The paper identifies the shattered gradients problem, showing that gradients in standard deep networks decay exponentially with depth, resembling white noise, while skip-connections like ResNets resist this decay. It proposes a new 'looks linear' initialization that enables training very deep networks without skip-connections, with preliminary experiments supporting this.

A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. Although, the problem has largely been overcome via carefully constructed initializations and batch normalization, architectures incorporating skip-connections such as highway and resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch normalization. In this paper, we identify the shattered gradients problem. Specifically, we show that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise whereas, in contrast, the gradients in architectures with skip-connections are far more resistant to shattering, decaying sublinearly. Detailed empirical evidence is presented in support of the analysis, on both fully-connected networks and convnets. Finally, we present a new "looks linear" (LL) initialization that prevents shattering, with preliminary experiments showing the new initialization allows to train very deep networks without the addition of skip-connections.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes