LGNov 20, 2022

Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss

arXiv:2211.10906v35 citationsh-index: 79
Originality Incremental advance
AI Analysis

This addresses a common issue in real-world applications where data is imbalanced and contains errors, though it is incremental as it builds on existing techniques for handling long-tailed or noisy data separately.

The paper tackles the problem of training deep neural networks on datasets that are both long-tailed and noisy, proposing a method that combines sample selection with a balanced loss to achieve state-of-the-art performance on benchmarks.

The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Many methods have been proposed to deal with long-tailed data or noisy data, while a few methods are developed to tackle long-tailed noisy data. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes