LGMLMay 13, 2020

The effect of Target Normalization and Momentum on Dying ReLU

arXiv:2005.06195v118 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental optimization issue in neural networks for researchers and practitioners, but it is incremental as it builds on known mitigations to better understand underlying causes.

The paper investigates how target normalization and momentum affect the dying ReLU problem in neural networks, finding empirically that low target variance increases ReLU death and showing theoretically how momentum can drive parameters into regions where ReLUs die, with the issue persisting in deeper models like residual networks.

Optimizing parameters with momentum, normalizing data values, and using rectified linear units (ReLUs) are popular choices in neural network (NN) regression. Although ReLUs are popular, they can collapse to a constant function and "die", effectively removing their contribution from the model. While some mitigations are known, the underlying reasons of ReLUs dying during optimization are currently poorly understood. In this paper, we consider the effects of target normalization and momentum on dying ReLUs. We find empirically that unit variance targets are well motivated and that ReLUs die more easily, when target variance approaches zero. To further investigate this matter, we analyze a discrete-time linear autonomous system, and show theoretically how this relates to a model with a single ReLU and how common properties can result in dying ReLU. We also analyze the gradients of a single-ReLU model to identify saddle points and regions corresponding to dying ReLU and how parameters evolve into these regions when momentum is used. Finally, we show empirically that this problem persist, and is aggravated, for deeper models including residual networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes