Investigating Why Contrastive Learning Benefits Robustness Against Label Noise
This work addresses the problem of noisy labels in deep learning for researchers and practitioners, providing theoretical insights and empirical gains, though it is incremental as it builds on existing contrastive learning methods.
The paper investigates why contrastive learning improves robustness against label noise, proving that it yields representations with prominent singular values aligned to clean labels, enabling effective learning without overfitting noise, and demonstrates state-of-the-art performance with accuracy increases of up to 27.18% on datasets like CIFAR-10 under extreme noise.
Self-supervised Contrastive Learning (CL) has been recently shown to be very effective in preventing deep networks from overfitting noisy labels. Despite its empirical success, the theoretical understanding of the effect of contrastive learning on boosting robustness is very limited. In this work, we rigorously prove that the representation matrix learned by contrastive learning boosts robustness, by having: (i) one prominent singular value corresponding to each sub-class in the data, and significantly smaller remaining singular values; and (ii) {a large alignment between the prominent singular vectors and the clean labels of each sub-class. The above properties enable a linear layer trained on such representations to effectively learn the clean labels without overfitting the noise.} We further show that the low-rank structure of the Jacobian of deep networks pre-trained with contrastive learning allows them to achieve a superior performance initially, when fine-tuned on noisy labels. Finally, we demonstrate that the initial robustness provided by contrastive learning enables robust training methods to achieve state-of-the-art performance under extreme noise levels, e.g., an average of 27.18\% and 15.58\% increase in accuracy on CIFAR-10 and CIFAR-100 with 80\% symmetric noisy labels, and 4.11\% increase in accuracy on WebVision.