LGAIMar 2

The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks

arXiv:2603.02293v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses robust generalization for deep learning practitioners facing label noise, offering a stable post-hoc method to mitigate memorization, though it is incremental as it builds on existing spectral analysis and noise theory.

The paper tackles the problem of harmful overfitting in over-parameterized networks under label noise by identifying the Malignant Tail mechanism, where networks segregate noise into high-frequency subspaces, and shows that explicit spectral truncation can recover optimal generalization, achieving results comparable to noise-free training.

While implicit regularization facilitates benign overfitting in low-noise regimes, recent theoretical work predicts a sharp phase transition to harmful overfitting as the noise-to-signal ratio increases. We experimentally isolate the geometric mechanism of this transition: the Malignant Tail, a failure mode where networks functionally segregate signal and noise, reducing coherent semantic features into low-rank subspaces while pushing stochastic label noise into high-frequency orthogonal components, distinct from systematic or corruption-aligned noise. Through a Spectral Linear Probe of training dynamics, we demonstrate that Stochastic Gradient Descent (SGD) fails to suppress this noise, instead implicitly biasing it toward high-frequency orthogonal subspaces, effectively preserving signal-noise separability. We show that this geometric separation is distinct from simple variance reduction in untrained models. In trained networks, SGD actively segregates noise, allowing post-hoc Explicit Spectral Truncation (d << D) to surgically prune the noise-dominated subspace. This approach recovers the optimal generalization capability latent in the converged model. Unlike unstable temporal early stopping, Geometric Truncation provides a stable post-hoc intervention. Our findings suggest that under label noise, excess spectral capacity is not harmless redundancy but a latent structural liability that allows for noise memorization, necessitating explicit rank constraints to filter stochastic corruptions for robust generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes