LGFeb 12, 2025

Enhancing Sample Selection Against Label Noise by Cutting Mislabeled Easy Examples

arXiv:2502.08227v34 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses label noise in machine learning, offering an incremental improvement by focusing on a specific harmful subset of mislabeled examples.

The paper tackles the problem of mislabeled easy examples (MEEs) in learning with noisy labels by proposing Early Cutting, a method that recalibrates sample selection to filter out MEEs, leading to improved model performance on datasets like CIFAR, WebVision, and ImageNet-1k.

Sample selection is a prevalent approach in learning with noisy labels, aiming to identify confident samples for training. Although existing sample selection methods have achieved decent results by reducing the noise rate of the selected subset, they often overlook that not all mislabeled examples harm the model's performance equally. In this paper, we demonstrate that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We refer to these examples as Mislabeled Easy Examples (MEEs). To address this, we propose Early Cutting, which introduces a recalibration step that employs the model's later training state to re-select the confident subset identified early in training, thereby avoiding misleading confidence from early learning and effectively filtering out MEEs. Experiments on the CIFAR, WebVision, and full ImageNet-1k datasets demonstrate that our method effectively improves sample selection and model performance by reducing MEEs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes