PENCIL: Deep Learning with Noisy Labels
This addresses the challenge of training deep learning models with noisy labels, which is common in real-world data collection, offering a general and robust solution without needing clean datasets or noise priors.
The paper tackles the problem of deep learning performance degradation due to noisy labels in datasets by proposing PENCIL, an end-to-end framework that updates network parameters and label estimations as distributions, resulting in outperforming previous state-of-the-art methods by large margins on synthetic and real-world datasets with various noise types and rates.
Deep learning has achieved excellent performance in various computer vision tasks, but requires a lot of training examples with clean labels. It is easy to collect a dataset with noisy labels, but such noise makes networks overfit seriously and accuracies drop dramatically. To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions. PENCIL is independent of the backbone network structure and does not need an auxiliary clean dataset or prior information about noise, thus it is more general and robust than existing methods and is easy to apply. PENCIL can even be used repeatedly to obtain better performance. PENCIL outperforms previous state-of-the-art methods by large margins on both synthetic and real-world datasets with different noise types and noise rates. And PENCIL is also effective in multi-label classification tasks through adding a simple attention structure on backbone networks. Experiments show that PENCIL is robust on clean datasets, too.