LGCVDec 5, 2020

A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?

arXiv:2012.03061v159 citations
AI Analysis

This survey addresses the critical problem of training robust deep learning models for researchers and practitioners when label correctness cannot be guaranteed, which is a significant challenge across many domains.

This paper surveys various deep learning techniques designed to handle noisy labels, a common issue in automatically collected or human-annotated datasets. It categorizes existing algorithms into robust losses, sample weighting, sample selection, meta-learning, and combined approaches, and also reviews experimental setups, datasets, and state-of-the-art results.

Noisy Labels are commonly present in data sets automatically collected from the internet, mislabeled by non-specialist annotators, or even specialists in a challenging task, such as in the medical field. Although deep learning models have shown significant improvements in different domains, an open issue is their ability to memorize noisy labels during training, reducing their generalization potential. As deep learning models depend on correctly labeled data sets and label correctness is difficult to guarantee, it is crucial to consider the presence of noisy labels for deep learning training. Several approaches have been proposed in the literature to improve the training of deep learning models in the presence of noisy labels. This paper presents a survey on the main techniques in literature, in which we classify the algorithm in the following groups: robust losses, sample weighting, sample selection, meta-learning, and combined approaches. We also present the commonly used experimental setup, data sets, and results of the state-of-the-art models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes