CVLGIVOct 2, 2020

Weight and Gradient Centralization in Deep Neural Networks

arXiv:2010.00866v320 citations
Originality Synthesis-oriented
AI Analysis

This work provides an incremental improvement for deep learning practitioners by enhancing network generalization while eliminating runtime overhead during deployment.

The paper tackles the problem of improving generalization in deep neural networks by combining weight and gradient normalization methods, achieving increased generalization without affecting inference runtime.

Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. Additional work has shown that the normalization of weights and additional conditioning as well as the normalization of gradients further improve the generalization. In this work, we combine several of these methods and thereby increase the generalization of the networks. The advantage of the newer methods compared to the batch normalization is not only increased generalization, but also that these methods only have to be applied during training and, therefore, do not influence the running time during use. Link to CUDA code https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes