LGMLOct 7, 2022

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

arXiv:2210.03820v226 citationsh-index: 68
Originality Incremental advance
AI Analysis

This work addresses theoretical understanding of training dynamics for neural networks with practical components like biases and normalization, though it is incremental in extending prior homogeneous network results.

The paper tackles the maximum-margin bias in quasi-homogeneous neural networks trained with gradient flow, generalizing existing results to models with biases and normalization layers, and finds that gradient flow favors a subset of parameters, which can degrade robustness but may reduce sparsity.

In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics. Using this analysis, we generalize the existing results of maximum-margin bias for homogeneous networks to this richer class of models. We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally. We demonstrate through simple examples how this strong favoritism toward minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous models. On the other hand, we conjecture that this norm-minimization discards, when possible, unnecessary higher-order parameters, reducing the model to a sparser parameterization. Lastly, by applying our theorem to sufficiently expressive neural networks with normalization layers, we reveal a universal mechanism behind the empirical phenomenon of Neural Collapse.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes