LGApr 21

ZC-Swish: Stabilizing Deep BN-Free Networks for Edge and Micro-Batch Applications

arXiv:2604.194530.4
AI Analysis

For practitioners deploying deep networks in micro-batch or federated learning settings where Batch Normalization is infeasible, ZC-Swish offers a simple drop-in fix to prevent training collapse.

The paper identifies that standard activation functions like Swish and ReLU cause instability in deep BN-free networks due to non-zero-centered activation means, and proposes ZC-Swish, a parameterized activation that anchors means near zero. In stress tests on BN-free CNNs, ZC-Swish maintains stable training and achieves 51.5% test accuracy at depth 16, while standard Swish collapses to near-random performance.

Batch Normalization (BN) is a cornerstone of deep learning, yet it fundamentally breaks down in micro-batch regimes (e.g., 3D medical imaging) and non-IID Federated Learning. Removing BN from deep architectures, however, often leads to catastrophic training failures such as vanishing gradients and dying channels. We identify that standard activation functions, like Swish and ReLU, exacerbate this instability in BN-free networks due to their non-zero-centered nature, which causes compounding activation mean-shifts as network depth increases. In this technical communication, we propose Zero-Centered Swish (ZC-Swish), a drop-in activation function parameterized to dynamically anchor activation means near zero. Through targeted stress-testing on BN-free convolutional networks at depths 8, 16, and 32, we demonstrate that while standard Swish collapses to near-random performance at depth 16 and beyond, ZC-Swish maintains stable layer-wise activation dynamics and achieves the highest test accuracy at depth 16 (51.5%) with seed 42. ZC-Swish thus provides a robust, parameter-efficient solution for stabilizing deep networks in memory-constrained and privacy-preserving applications where traditional normalization is unviable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes