The Persistence of Neural Collapse Despite Low-Rank Bias
This work addresses the theoretical understanding of feature geometry in deep learning for researchers, providing incremental insights into why neural collapse appears despite suboptimality.
The paper tackles the persistence of neural collapse in deep networks by extending theoretical analysis to cross-entropy loss, showing that high-rank structures like deep neural collapse are not generally optimal due to a low-rank bias, with experiments validating a fixed bound on singular values as depth increases.
Neural collapse (NC) and its multi-layer variant, deep neural collapse (DNC), describe a structured geometry that occurs in the features and weights of trained deep networks. Recent theoretical work by Sukenik et al. using a deep unconstrained feature model (UFM) suggests that DNC is suboptimal under mean squared error (MSE) loss. They heuristically argue that this is due to low-rank bias induced by L2 regularization. In this work, we extend this result to deep UFMs trained with cross-entropy loss, showing that high-rank structures, including DNC, are not generally optimal. We characterize the associated low-rank bias, proving a fixed bound on the number of non-negligible singular values at global minima as network depth increases. We further analyze the loss surface, demonstrating that DNC is more prevalent in the landscape than other critical configurations, which we argue explains its frequent empirical appearance. Our results are validated through experiments in deep UFMs and deep neural networks.