LGNEOct 21, 2020

Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence

arXiv:2010.10687v14 citations
Originality Incremental advance
AI Analysis

This work addresses the need for effective normalization methods in deep learning that avoid batch dependence, which is crucial for applications like online learning or small batch sizes, though it is incremental as it builds on existing normalization techniques.

The paper tackled the problem of understanding and replicating the performance benefits of Batch Normalization without its batch dependence, by identifying key statistical properties linked to its success and proposing two new normalizers, PreLayerNorm and RegNorm, which achieve comparable performance to Batch Norm while outperforming LayerNorm and being applicable in batch-independent scenarios.

We perform an extensive empirical study of the statistical properties of Batch Norm and other common normalizers. This includes an examination of the correlation between representations of minibatches, gradient norms, and Hessian spectra both at initialization and over the course of training. Through this analysis, we identify several statistical properties which appear linked to Batch Norm's superior performance. We propose two simple normalizers, PreLayerNorm and RegNorm, which better match these desirable properties without involving operations along the batch dimension. We show that PreLayerNorm and RegNorm achieve much of the performance of Batch Norm without requiring batch dependence, that they reliably outperform LayerNorm, and that they can be applied in situations where Batch Norm is ineffective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes