LGDec 2, 2020

Extended T: Learning with Mixed Closed-set and Open-set Noisy Labels

arXiv:2012.00932v119 citations
AI Analysis

This work provides a more robust method for training machine learning models in the presence of complex label noise, which is a common problem for practitioners dealing with real-world datasets.

This paper addresses the problem of learning with mixed closed-set and open-set noisy labels by extending the traditional label noise transition matrix. The proposed cluster-dependent extended transition matrix and an unbiased estimator allow for more robust performance compared to prior state-of-the-art methods in both synthetic and real experiments.

The label noise transition matrix $T$, reflecting the probabilities that true labels flip into noisy ones, is of vital importance to model label noise and design statistically consistent classifiers. The traditional transition matrix is limited to model closed-set label noise, where noisy training data has true class labels within the noisy label set. It is unfitted to employ such a transition matrix to model open-set label noise, where some true class labels are outside the noisy label set. Thus when considering a more realistic situation, i.e., both closed-set and open-set label noise occurs, existing methods will undesirably give biased solutions. Besides, the traditional transition matrix is limited to model instance-independent label noise, which may not perform well in practice. In this paper, we focus on learning under the mixed closed-set and open-set label noise. We address the aforementioned issues by extending the traditional transition matrix to be able to model mixed label noise, and further to the cluster-dependent transition matrix to better approximate the instance-dependent label noise in real-world applications. We term the proposed transition matrix as the cluster-dependent extended transition matrix. An unbiased estimator (i.e., extended $T$-estimator) has been designed to estimate the cluster-dependent extended transition matrix by only exploiting the noisy data. Comprehensive synthetic and real experiments validate that our method can better model the mixed label noise, following its more robust performance than the prior state-of-the-art label-noise learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes