LGOct 29, 2022

Perturbation Analysis of Neural Collapse

Tom Tirer, Haoxiang Huang, Jonathan Niles-Weed

arXiv:2210.16658v220.534 citationsh-index: 17

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap in understanding neural collapse for researchers in deep learning, though it is incremental as it builds on prior idealized models.

The paper tackles the problem that neural collapse, a phenomenon in deep neural network training, is not exact in practice due to constraints from intermediate features, by proposing a perturbation analysis model that captures near-collapse behavior. It proves reduction in within-class variability and analyzes minimizers, with experimental support in practical settings.

Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point. In this phase of training, a "neural collapse" behavior has been observed: the variability of features (outputs of the penultimate layer) of within-class samples decreases and the mean features of different classes approach a certain tight frame structure. Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse. However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features). We explore the model in the small vicinity case via perturbation analysis and establish results that cannot be obtained by the previously studied models. For example, we prove reduction in the within-class variability of the optimized features compared to the predefined input features (via analyzing gradient flow on the "central-path" with minimal assumptions), analyze the minimizers in the near-collapse regime, and provide insights on the effect of regularization hyperparameters on the closeness to collapse. We support our theory with experiments in practical deep learning settings.

View on arXiv PDF

Similar