CVLGMar 21, 2021

ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

arXiv:2103.11395v346 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of robust machine learning in noisy real-world datasets, though it is incremental as it builds on existing methods like expectation maximization and semi-supervised learning.

The paper tackles the problem of learning from severe label noise by proposing ScanMix, a training algorithm that combines semantic clustering and semi-supervised learning, achieving state-of-the-art results on benchmarks like CIFAR-10/-100, Red Mini-ImageNet, Clothing1M, and WebVision.

We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster the training images based on their appearance and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. We present a theoretical result that shows the correctness and convergence of ScanMix, and an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100 (with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In all benchmarks with severe label noise, our results are competitive to the current SOTA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes