NE LG MLApr 26, 2016

Scale Normalization

arXiv:1604.07796v11.5

Originality Incremental advance

AI Analysis

This addresses a fundamental training issue for deep learning practitioners, but appears incremental as it builds on existing scale-preserving initialization techniques.

The paper tackled the problem of improper scaling between layers in deep neural networks, which causes exploding/vanishing gradients, by proposing two methods to maintain isometry beyond initial weights, resulting in faster learning as shown in preliminary experiments.

One of the difficulties of training deep neural networks is caused by improper scaling between layers. Scaling issues introduce exploding / gradient problems, and have typically been addressed by careful scale-preserving initialization. We investigate the value of preserving scale, or isometry, beyond the initial weights. We propose two methods of maintaing isometry, one exact and one stochastic. Preliminary experiments show that for both determinant and scale-normalization effectively speeds up learning. Results suggest that isometry is important in the beginning of learning, and maintaining it leads to faster learning.

View on arXiv PDF

Similar