Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay
This provides insights into feature shaping in neural networks, but it is incremental as it builds on existing observations of Neural Collapse.
The paper investigates how batch normalization and weight decay influence the emergence of Neural Collapse, a geometric structure in deep neural networks, showing that these factors critically affect it with theoretical lower bounds and experimental validation.
Neural Collapse (NC) is a geometric structure recently observed at the terminal phase of training deep neural networks, which states that last-layer feature vectors for the same class would "collapse" to a single point, while features of different classes become equally separated. We demonstrate that batch normalization (BN) and weight decay (WD) critically influence the emergence of NC. In the near-optimal loss regime, we establish an asymptotic lower bound on the emergence of NC that depends only on the WD value, training loss, and the presence of last-layer BN. Our experiments substantiate theoretical insights by showing that models demonstrate a stronger presence of NC with BN, appropriate WD values, lower loss, and lower last-layer feature norm. Our findings offer a novel perspective in studying the role of BN and WD in shaping neural network features.