Making Risk Minimization Tolerant to Label Noise
This addresses the issue of label noise in machine learning applications, offering a theoretical and practical solution for more robust classification, though it is incremental as it builds on existing risk minimization frameworks.
The paper tackles the problem of learning classifiers from training data corrupted by label noise, proving that risk minimization under certain non-convex loss functions (0-1, sigmoid, ramp, probit) is tolerant to uniform noise and can be adapted for non-uniform noise under separability assumptions, with empirical results showing much better robustness compared to SVM.
In many applications, the training data, from which one needs to learn a classifier, is corrupted with label noise. Many standard algorithms such as SVM perform poorly in presence of label noise. In this paper we investigate the robustness of risk minimization to label noise. We prove a sufficient condition on a loss function for the risk minimization under that loss to be tolerant to uniform label noise. We show that the $0-1$ loss, sigmoid loss, ramp loss and probit loss satisfy this condition though none of the standard convex loss functions satisfy it. We also prove that, by choosing a sufficiently large value of a parameter in the loss function, the sigmoid loss, ramp loss and probit loss can be made tolerant to non-uniform label noise also if we can assume the classes to be separable under noise-free data distribution. Through extensive empirical studies, we show that risk minimization under the $0-1$ loss, the sigmoid loss and the ramp loss has much better robustness to label noise when compared to the SVM algorithm.