LGMLJul 22, 2019

Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

arXiv:1907.09539v124 citations
Originality Incremental advance
AI Analysis

This addresses a critical optimization challenge for researchers using deep networks in inverse problems like deep image prior, though it is incremental as it builds on existing normalization techniques.

The paper tackles the problem of vanishing gradients in deep convolutional neural networks trained on a single example, showing that channel normalization prevents this issue, whereas without it, gradient descent requires exponentially many steps to approach an optimum.

Normalization layers are widely used in deep neural networks to stabilize training. In this paper, we consider the training of convolutional neural networks with gradient descent on a single training example. This optimization problem arises in recent approaches for solving inverse problems such as the deep image prior or the deep decoder. We show that for this setup, channel normalization, which centers and normalizes each channel individually, avoids vanishing gradients, whereas, without normalization, gradients vanish which prevents efficient optimization. This effect prevails in deep single-channel linear convolutional networks, and we show that without channel normalization, gradient descent takes at least exponentially many steps to come close to an optimum. Contrary, with channel normalization, the gradients remain bounded, thus avoiding exploding gradients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes