MLAILGJun 7, 2021

Batch Normalization Orthogonalizes Representations in Deep Random Networks

arXiv:2106.03970v145 citations
Originality Incremental advance
AI Analysis

This provides theoretical insight into BN's role in representation geometry, potentially improving training efficiency for deep learning practitioners, though it is incremental as it builds on existing BN analysis.

The paper shows that batch normalization (BN) causes hidden representations in deep random networks to become increasingly orthogonal across layers, with orthogonality deviation decaying rapidly with depth and inversely with width. This orthogonality accelerates stochastic gradient descent (SGD) by reducing the need for initial alignment steps during optimization.

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes