CVAILGNov 19, 2018

Do Normalization Layers in a Deep ConvNet Really Need to Be Distinct?

arXiv:1811.07727v17 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational question in deep learning for researchers and practitioners, but it is incremental as it builds on existing normalization methods.

The study tackled the problem of whether different normalization layers in a convolutional neural network require distinct normalizers, finding that using distinct normalizers improves learning and generalization, with choices influenced by depth and batch size.

Yes, they do. This work investigates a perspective for deep learning: whether different normalization layers in a ConvNet require different normalizers. This is the first step towards understanding this phenomenon. We allow each convolutional layer to be stacked before a switchable normalization (SN) that learns to choose a normalizer from a pool of normalization methods. Through systematic experiments in ImageNet, COCO, Cityscapes, and ADE20K, we answer three questions: (a) Is it useful to allow each normalization layer to select its own normalizer? (b) What impacts the choices of normalizers? (c) Do different tasks and datasets prefer different normalizers? Our results suggest that (1) using distinct normalizers improves both learning and generalization of a ConvNet; (2) the choices of normalizers are more related to depth and batch size, but less relevant to parameter initialization, learning rate decay, and solver; (3) different tasks and datasets have different behaviors when learning to select normalizers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes