CVAILGDec 3, 2022

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

arXiv:2212.01674v212 citationsh-index: 47
AI Analysis

This addresses the issue of noisy labels in deep learning, which can degrade model performance, but it is incremental as it builds on existing label correction and co-teaching methods.

The paper tackles the problem of deep learning robustness to label noise by proposing CrossSplit, a training procedure that uses two networks on disjoint data splits for label correction and semi-supervised training, achieving state-of-the-art results across multiple datasets like CIFAR-10 and CIFAR-100 with various noise ratios.

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labelled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes