Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss
This addresses a common issue in real-world applications where data is imbalanced and contains errors, though it is incremental as it builds on existing techniques for handling long-tailed or noisy data separately.
The paper tackles the problem of training deep neural networks on datasets that are both long-tailed and noisy, proposing a method that combines sample selection with a balanced loss to achieve state-of-the-art performance on benchmarks.
The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Many methods have been proposed to deal with long-tailed data or noisy data, while a few methods are developed to tackle long-tailed noisy data. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.