LGOct 17, 2021

A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

arXiv:2110.08725v12 citations
Originality Incremental advance
AI Analysis

This work addresses the theoretical understanding of batch normalization's impact on neural network training dynamics, which is incremental as it builds on existing mean-field and gradient flow theories.

The authors studied the training dynamics of two-layer neural networks with batch normalization (BN) by reformulating it as dynamics on a Riemannian manifold, identifying BN's effect as changing the metric in parameter space, and derived a mean-field formulation in the infinite-width limit, showing it corresponds to a Wasserstein gradient flow on the manifold, with theoretical analysis provided for well-posedness and convergence.

The training dynamics of two-layer neural networks with batch normalization (BN) is studied. It is written as the training dynamics of a neural network without BN on a Riemannian manifold. Therefore, we identify BN's effect of changing the metric in the parameter space. Later, the infinite-width limit of the two-layer neural networks with BN is considered, and a mean-field formulation is derived for the training dynamics. The training dynamics of the mean-field formulation is shown to be the Wasserstein gradient flow on the manifold. Theoretical analysis are provided on the well-posedness and convergence of the Wasserstein gradient flow.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes