Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

arXiv:2602.17063v11.4h-index: 29

Originality Incremental advance

AI Analysis

This addresses a fundamental bottleneck in extreme model compression for efficient deployment of large neural networks, though it appears incremental as it builds on existing compression methods.

The paper tackles the problem of sub-bit model compression where sign bits become a bottleneck, showing that weight signs in neural networks largely persist from random initialization with only rare flips. They introduce sign lock-in theory to explain this behavior and propose a gap-based initialization and regularizer that reduces sign flip rates to approximately 10^-3 with only about a one-point perplexity increase.

Sub-bit model compression seeks storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. Rademacher baseline. Despite this apparent randomness, most weights retain their initialization signs; flips primarily occur via rare near-zero boundary crossings, suggesting that sign-pattern randomness is largely inherited from initialization. We formalize this behavior with sign lock-in theory, a stopping-time analysis of sign flips under SGD noise. Under bounded updates and a rare re-entry condition into a small neighborhood around zero, the number of effective sign flips exhibits a geometric tail. Building on this mechanism, we introduce a gap-based initialization and a lightweight outward-drift regularizer, reducing the effective flip rate to approximately $10^{-3}$ with only about a one-point increase in perplexity.

View on arXiv PDF

Similar