Neural Collapse by Design: Learning Class Prototypes on the Hypersphere
For practitioners of supervised classification, this work provides a unified framework and practical losses that achieve the theoretical optimum (Neural Collapse) faster and with better transfer and robustness.
This paper shows that cross-entropy and supervised contrastive learning are both forms of prototype contrast on the hypersphere, and proposes normalized losses (NTCE, NONL) that achieve Neural Collapse (≥95%) in under 7.5% of iterations, while SCL with fixed prototypes matches linear probing without extra training. The learned geometry yields +5.5% mean relative improvement in transfer learning and up to +8.7% under class imbalance.
Supervised classification has a theoretical optimum, Neural Collapse (NC), yet neither of its two dominant paradigms reaches it in practice. Cross entropy (CE) leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while supervised contrastive learning (SCL) drives features toward NC during pretraining but discards this structure in a post hoc linear probing phase. We show that both paradigms are different appearances of the same method, prototype contrast on the unit hypersphere, and that closing the gap requires fixing each at its specific point of failure. From the CE side, we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients into classifier learning: a large effective negative set and decoupled alignment and uniformity terms. From the SCL side, we prove that SCL's objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful. Empirically, on four benchmarks including ImageNet-1K, NTCE and NONL surpass CE accuracy, closely approximate NC ($\geq 95\%$), and match CE's converged NC on 4/5 metrics in under $7.5\%$ of its iterations, while SCL with fixed prototypes matches linear probing without the hours-long classifier training phase. The learned geometry yields $+5.5\%$ mean relative improvement in transfer learning, up to $+8.7\%$ under severe class imbalance, and lower mCE on ImageNet-C, recasting supervised learning as prototype learning on the hypersphere, with NC reached by design on both paths.