Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks
For deep learning researchers, this provides a theoretical understanding of how BN shapes the function geometry during training, but the result is incremental as it builds on known geometric properties of CPA networks.
This paper shows that training-time batch normalization (BN) increases the expected local partition refinement (number of affine regions) in piecewise-affine networks, providing a geometric explanation for BN's effect on the learned function beyond optimization. The result is proven under explicit sufficient conditions for ReLU and general CPA networks.
Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.