LGMay 3, 2016

Learning from Binary Labels with Instance-Dependent Corruption

Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

arXiv:1605.00751v213.444 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental challenge in robust machine learning for scenarios with complex, instance-dependent label noise, though it is incremental as it builds on existing theoretical frameworks.

The paper tackles the problem of learning from binary labels corrupted by instance- and label-dependent noise, proving that consistent classification on noisy data implies consistency on clean data and that the Isotron can efficiently learn under certain noise models.

Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise. With sufficiently many such samples, can we optimally classify and rank instances with respect to the noise-free distribution? We provide a theoretical analysis of this question, with three main contributions. First, we prove that for instance-dependent noise, any algorithm that is consistent for classification on the noisy distribution is also consistent on the clean distribution. Second, we prove that for a broad class of instance- and label-dependent noise, a similar consistency result holds for the area under the ROC curve. Third, for the latter noise model, when the noise-free class-probability function belongs to the generalised linear model family, we show that the Isotron can efficiently and provably learn from the corrupted sample.

View on arXiv PDF

Similar