LGOCMLJun 20, 2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

arXiv:2306.11680v28 citationsh-index: 64
Originality Incremental advance
AI Analysis

This provides theoretical insights into batch normalization for researchers in machine learning, though it is incremental as it builds on existing implicit bias studies.

The paper analyzes the implicit bias of batch normalization in linear models and two-layer linear CNNs, showing that gradient descent converges to a uniform margin classifier with an exp(-Ω(log² t)) rate, distinguishing it from models without batch normalization and demonstrating that patch-wise uniform margin classifiers can outperform maximum margin classifiers in some cases.

We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-Ω(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in terms of both the type of implicit bias and the convergence rate. We further extend our result to a class of two-layer, single-filter linear convolutional neural networks, and show that batch normalization has an implicit bias towards a patch-wise uniform margin. Based on two examples, we demonstrate that patch-wise uniform margin classifiers can outperform the maximum margin classifiers in certain learning problems. Our results contribute to a better theoretical understanding of batch normalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes