LGMay 1, 2025

On the Importance of Gaussianizing Representations

arXiv:2505.00685v26 citationsh-index: 16ICML
Originality Highly original
AI Analysis

This work addresses a foundational issue in deep learning by providing a method to optimize activation distributions, potentially benefiting all neural network training.

The authors tackled the problem of what distribution neural network activations should follow by proposing normality normalization, a novel layer that encourages Gaussian representations using a power transform and additive Gaussian noise, achieving improved generalization and robustness across various models and datasets.

The normal distribution plays a central role in information theory - it is at the same time the best-case signal and worst-case noise distribution, has the greatest representational capacity of any distribution, and offers an equivalence between uncorrelatedness and independence for joint distributions. Accounting for the mean and variance of activations throughout the layers of deep neural networks has had a significant effect on facilitating their effective training, but seldom has a prescription for precisely what distribution these activations should take, and how this might be achieved, been offered. Motivated by the information-theoretic properties of the normal distribution, we address this question and concurrently present normality normalization: a novel normalization layer which encourages normality in the feature representations of neural networks using the power transform and employs additive Gaussian noise during training. Our experiments comprehensively demonstrate the effectiveness of normality normalization, in regards to its generalization performance on an array of widely used model and dataset combinations, its strong performance across various common factors of variation such as model width, depth, and training minibatch size, its suitability for usage wherever existing normalization layers are conventionally used, and as a means to improving model robustness to random perturbations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes