LGNEMLNov 1, 2018

Stochastic Normalizations as Bayesian Learning

arXiv:1811.00639v116 citations
Originality Incremental advance
AI Analysis

This work addresses the generalization problem in deep learning by providing a Bayesian perspective on normalization, offering a method to enhance performance and uncertainty estimation for practitioners.

The paper investigates why Batch Normalization improves generalization in deep networks, attributing it to randomness in batch statistics, and applies this Bayesian learning interpretation to other normalization methods, achieving comparable test performance and better validation losses for uncertainty estimation.

In this work we investigate the reasons why Batch Normalization (BN) improves the generalization performance of deep networks. We argue that one major reason, distinguishing it from data-independent normalization methods, is randomness of batch statistics. This randomness appears in the parameters rather than in activations and admits an interpretation as a practical Bayesian learning. We apply this idea to other (deterministic) normalization techniques that are oblivious to the batch size. We show that their generalization performance can be improved significantly by Bayesian learning of the same form. We obtain test performance comparable to BN and, at the same time, better validation losses suitable for subsequent output uncertainty estimation through approximate Bayesian posterior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes