CVJan 8, 2025

Open set label noise learning with robust sample selection and margin-guided module

Yuandi Zhao, Qianxi Xia, Yang Sun, Zhijie Wen, Liyan Ma, Shihui Ying

arXiv:2501.04269v16.27 citationsh-index: 3Knowledge-Based Systems

Originality Incremental advance

AI Analysis

This addresses a critical issue in real-world datasets for computer vision, where label noise can degrade model performance, though it is incremental as it builds on existing label noise learning methods.

The paper tackles the problem of open set label noise in deep learning, where some training samples belong to unknown classes outside the known label space, and introduces a method called RSS-MGM that combines robust sample selection and margin-guided modules to better distinguish and handle such noise, achieving state-of-the-art performance on benchmark datasets like CIFAR-100N-C and Food101N.

In recent years, the remarkable success of deep neural networks (DNNs) in computer vision is largely due to large-scale, high-quality labeled datasets. Training directly on real-world datasets with label noise may result in overfitting. The traditional method is limited to deal with closed set label noise, where noisy training data has true class labels within the known label space. However, there are some real-world datasets containing open set label noise, which means that some samples belong to an unknown class outside the known label space. To address the open set label noise problem, we introduce a method based on Robust Sample Selection and Margin-Guided Module (RSS-MGM). Firstly, unlike the prior clean sample selection approach, which only select a limited number of clean samples, a robust sample selection module combines small loss selection or high-confidence sample selection to obtain more clean samples. Secondly, to efficiently distinguish open set label noise and closed set ones, margin functions are designed to filter open-set data and closed set data. Thirdly, different processing methods are selected for different types of samples in order to fully utilize the data's prior information and optimize the whole model. Furthermore, extensive experimental results with noisy labeled data from benchmark datasets and real-world datasets, such as CIFAR-100N-C, CIFAR80N-O, WebFG-469, and Food101N, indicate that our approach outperforms many state-of-the-art label noise learning methods. Especially, it can more accurately divide open set label noise samples and closed set ones.

View on arXiv PDF

Similar