LGNov 21, 2025

Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels

arXiv:2511.17040v1

Originality Incremental advance

AI Analysis

This addresses the issue of noisy labels degrading model performance for machine learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the problem of training deep neural networks with noisy labels by proposing Step-E, a framework that integrates sample selection and model learning into a single optimization process, improving test accuracy on CIFAR-100N from 43.3% to 50.4% and on CIFAR-10N from 83.9% to 85.3%.

Training data collected in the wild often contain noisy labels and outliers that substantially degrade the performance and reliability of deep neural networks. While data cleaning is commonly applied as a separate preprocessing stage, such two-stage pipelines neither fully exploit feedback from the downstream model nor adapt to unknown noise patterns. We propose Step-E, a simple framework that integrates sample selection and model learning into a single optimization process. At each epoch, Step-E ranks samples by loss and gradually increases the fraction of high-loss examples that are excluded from gradient updates after a brief warm-up stage, yielding an online curriculum that focuses on easy and consistent examples and eventually ignores persistent outliers. On CIFAR-100N, Step-E improves the test accuracy of a ResNet-18 model from 43.3% (+/- 0.7%) to 50.4% (+/- 0.9%), clearly outperforming loss truncation, self-paced learning, and one-shot filtering while approaching the clean-label oracle at 60.5% (+/- 0.2%). On CIFAR-10N (aggre), Step-E also improves over the noisy baseline (85.3% vs. 83.9%) and nearly matches the clean-label oracle (85.9%), with only moderate training-time overhead.

View on arXiv PDF

Similar