LGOct 19, 2020

Importance Reweighting for Biquality Learning

arXiv:2010.09621v56 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of weakly supervised learning with label noise for machine learning practitioners, offering a generic solution that handles multiple noise types, though it builds incrementally on existing biquality data settings.

The paper tackles the problem of learning from datasets with various types of label noise by proposing a new reweighting scheme that identifies noncorrupted examples in an untrusted dataset, using a small trusted dataset. The approach outperforms baselines and state-of-the-art methods in extensive experiments simulating different noise types and dataset qualities.

The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of "supervision deficiencies", namely: poor quality, non adaptability, and insufficient quantity of labels. Regarding quality, label noise can be of different types, including completely-at-random, at-random or even not-at-random. All these kinds of label noise are addressed separately in the literature, leading to highly specialized approaches. This paper proposes an original, encompassing, view of Weakly Supervised Learning, which results in the design of generic approaches capable of dealing with any kind of label noise. For this purpose, an alternative setting called "Biquality data" is used. It assumes that a small trusted dataset of correctly labeled examples is available, in addition to an untrusted dataset of noisy examples. In this paper, we propose a new reweigthing scheme capable of identifying noncorrupted examples in the untrusted dataset. This allows one to learn classifiers using both datasets. Extensive experiments that simulate several types of label noise and that vary the quality and quantity of untrusted examples, demonstrate that the proposed approach outperforms baselines and state-of-the-art approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes