LGMay 3, 2016

Learning from Binary Labels with Instance-Dependent Corruption

arXiv:1605.00751v244 citations
AI Analysis

This addresses a fundamental challenge in robust machine learning for scenarios with complex, instance-dependent label noise, though it is incremental as it builds on existing theoretical frameworks.

The paper tackles the problem of learning from binary labels corrupted by instance- and label-dependent noise, proving that consistent classification on noisy data implies consistency on clean data and that the Isotron can efficiently learn under certain noise models.

Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise. With sufficiently many such samples, can we optimally classify and rank instances with respect to the noise-free distribution? We provide a theoretical analysis of this question, with three main contributions. First, we prove that for instance-dependent noise, any algorithm that is consistent for classification on the noisy distribution is also consistent on the clean distribution. Second, we prove that for a broad class of instance- and label-dependent noise, a similar consistency result holds for the area under the ROC curve. Third, for the latter noise model, when the noise-free class-probability function belongs to the generalised linear model family, we show that the Isotron can efficiently and provably learn from the corrupted sample.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes