LGAISep 27, 2025

Signal Preserving Weight Initialization for Odd-Sigmoid Activations

arXiv:2509.23085v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses the interdependence between activation functions and weight initialization for neural network practitioners, though it is incremental as it focuses on a specific activation class.

The paper tackles the problem of activation function saturation and variance collapse by proposing a weight initialization method specifically designed for odd sigmoid activations, which enables reliable training without normalization layers and improves data efficiency.

Activation functions critically influence trainability and expressivity, and recent work has therefore explored a broad range of nonlinearities. However, activations and weight initialization are interdependent: without an appropriate initialization method, nonlinearities can cause saturation, variance collapse, and increased learning rate sensitivity. We address this by defining an odd sigmoid function class and, given any activation f in this class, proposing an initialization method tailored to f. The method selects a noise scale in closed form so that forward activations remain well dispersed up to a target layer, thereby avoiding collapse to zero or saturation. Empirically, the approach trains reliably without normalization layers, exhibits strong data efficiency, and enables learning for activations under which standard initialization methods (Xavier, He, Orthogonal) often do not converge reliably.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes