Elucidating and Overcoming the Challenges of Label Noise in Supervised Contrastive Learning
This addresses label noise issues in SCL for computer vision, offering improved robustness, though it is incremental as it builds on existing SCL methods.
The paper tackles the problem of label noise in supervised contrastive learning (SCL), finding that human labeling errors manifest as easy positive samples in about 99% of cases, and proposes D-SCL, a debiased objective that consistently outperforms state-of-the-art methods in representation learning across vision benchmarks.
Image classification datasets exhibit a non-negligible fraction of mislabeled examples, often due to human error when one class superficially resembles another. This issue poses challenges in supervised contrastive learning (SCL), where the goal is to cluster together data points of the same class in the embedding space while distancing those of disparate classes. While such methods outperform those based on cross-entropy, they are not immune to labeling errors. However, while the detrimental effects of noisy labels in supervised learning are well-researched, their influence on SCL remains largely unexplored. Hence, we analyse the effect of label errors and examine how they disrupt the SCL algorithm's ability to distinguish between positive and negative sample pairs. Our analysis reveals that human labeling errors manifest as easy positive samples in around 99% of cases. We, therefore, propose D-SCL, a novel Debiased Supervised Contrastive Learning objective designed to mitigate the bias introduced by labeling errors. We demonstrate that D-SCL consistently outperforms state-of-the-art techniques for representation learning across diverse vision benchmarks, offering improved robustness to label errors.