LGMay 14

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

Yuxin Guo, Yihao Yue, Yunhao Ni, Yizhou Ruan, Jie Luo, Wenjun Wu, Lei Huang

arXiv:2605.1452157.2

Predicted impact top 40% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For deep learning practitioners, this method reduces inference cost of LN without sacrificing accuracy, though it is incremental as it builds on existing RMSNorm and weight centering techniques.

Layer normalization (LN) introduces inference overhead due to centering; this paper proposes a framework to replace LN with RMSNorm in arbitrary DNNs without changing the model function, achieving 2-12% end-to-end acceleration while maintaining performance.

Layer normalization (LN) is a fundamental component in modern deep learning, but its per-sample centering and scaling introduce non-negligible inference overhead. RMSNorm improves efficiency by removing the centering operation, yet this may discard benefits associated with centering. This paper propose a framework to determine whether an LN in an arbitrary DNN can be replaced by RMSNorm without changing the model function. The key idea is to fold LN's centering operation into upstream general linear layers by enforcing zero-mean outputs through the column-centered constraint (CCC) and column-based weight centering (CBWC). We extend the analysis to arbitrary DNNs, define such LNs as foldable LNs, and develop a graph-based detection algorithm. Our analysis shows that many LNs in widely used architectures are foldable, enabling exact inference-time conversion and end-to-end acceleration of 2% to 12% without changing model predictions. Experiments across multiple task families further show that, when exact equivalence is partially broken in practical training settings, our method remains competitive with vanilla LN while improving efficiency.

View on arXiv PDF

Similar