LGApr 8, 2021

A Theoretical Analysis of Learning with Noisily Labeled Data

arXiv:2104.04114v11 citations
Originality Synthesis-oriented
AI Analysis

This provides theoretical insights for researchers in deep learning dealing with noisy data, but it is incremental as it builds on existing empirical studies without introducing new methods.

The paper tackles the problem of understanding training behaviors with noisy labels by theoretically analyzing two phenomena: clean data being learned first and a phase transition in testing error based on corruption rate. It shows that testing error improves with continued training only if the corrupted label rate is below a threshold, otherwise it increases.

Noisy labels are very common in deep supervised learning. Although many studies tend to improve the robustness of deep training for noisy labels, rare works focus on theoretically explaining the training behaviors of learning with noisily labeled data, which is a fundamental principle in understanding its generalization. In this draft, we study its two phenomena, clean data first and phase transition, by explaining them from a theoretical viewpoint. Specifically, we first show that in the first epoch training, the examples with clean labels will be learned first. We then show that after the learning from clean data stage, continuously training model can achieve further improvement in testing error when the rate of corrupted class labels is smaller than a certain threshold; otherwise, extensively training could lead to an increasing testing error.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes