Unlocking the Power of Open Set : A New Perspective for Open-Set Noisy Label Learning
This addresses a common real-world scenario in machine learning where datasets are corrupted by multiple types of label noise, offering a novel approach to improve robustness.
The paper tackles the problem of learning from datasets containing both open-set and closed-set label noise by proposing a two-step contrastive learning method, CECL, which exploits open-set examples to enhance performance, achieving state-of-the-art results on synthetic and real-world datasets.
Learning from noisy data has attracted much attention, where most methods focus on closed-set label noise. However, a more common scenario in the real world is the presence of both open-set and closed-set noise. Existing methods typically identify and handle these two types of label noise separately by designing a specific strategy for each type. However, in many real-world scenarios, it would be challenging to identify open-set examples, especially when the dataset has been severely corrupted. Unlike the previous works, we explore how models behave when faced with open-set examples, and find that \emph{a part of open-set examples gradually get integrated into certain known classes}, which is beneficial for the separation among known classes. Motivated by the phenomenon, we propose a novel two-step contrastive learning method CECL (Class Expansion Contrastive Learning) which aims to deal with both types of label noise by exploiting the useful information of open-set examples. Specifically, we incorporate some open-set examples into closed-set classes to enhance performance while treating others as delimiters to improve representative ability. Extensive experiments on synthetic and real-world datasets with diverse label noise demonstrate the effectiveness of CECL.