CVAILGMar 21, 2022

Delving into the Estimation Shift of Batch Normalization in a Network

arXiv:2203.10778v126 citationsh-index: 63
Originality Incremental advance
AI Analysis

This addresses a specific technical issue in deep learning normalization for researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the accumulation of estimation shift in batch normalization across deep networks, which harms test performance, and proposes XBNBlock to block this accumulation, achieving significant performance improvements on ImageNet and COCO benchmarks.

Batch normalization (BN) is a milestone technique in deep learning. It normalizes the activation using mini-batch statistics during training but the estimated population statistics during inference. This paper focuses on investigating the estimation of population statistics. We define the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones. Our primary observation is that the estimation shift can be accumulated due to the stack of BN in a network, which has detriment effects for the test performance. We further find a batch-free normalization (BFN) can block such an accumulation of estimation shift. These observations motivate our design of XBNBlock that replace one BN with BFN in the bottleneck block of residual-style networks. Experiments on the ImageNet and COCO benchmarks show that XBNBlock consistently improves the performance of different architectures, including ResNet and ResNeXt, by a significant margin and seems to be more robust to distribution shift.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes