CVMMAug 6, 2020

Salvage Reusable Samples from Noisy Data for Robust Learning

arXiv:2008.02427v152 citations
AI Analysis

This work addresses label noise in web images for fine-grained recognition, offering an incremental improvement over existing sample selection methods by reusing hard and mislabeled examples.

The paper tackles the problem of training deep fine-grained models with noisy web image labels by proposing CRSSC, a method that identifies and corrects reusable samples to improve robustness, achieving state-of-the-art results on benchmarks like CUB-200-2011 and Stanford Cars.

Due to the existence of label noise in web images and the high memorization capacity of deep neural networks, training deep fine-grained (FG) models directly through web images tends to have an inferior recognition ability. In the literature, to alleviate this issue, loss correction methods try to estimate the noise transition matrix, but the inevitable false correction would cause severe accumulated errors. Sample selection methods identify clean ("easy") samples based on the fact that small losses can alleviate the accumulated errors. However, "hard" and mislabeled examples that can both boost the robustness of FG models are also dropped. To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images. Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks. We demonstrate the superiority of the proposed approach from both theoretical and experimental perspectives.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes