ML LGFeb 13, 2021

Learning from Similarity-Confidence Data

Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

arXiv:2102.06879v114.428 citations

Originality Incremental advance

AI Analysis

This work addresses a novel weakly supervised learning problem to reduce labeling costs, though it appears incremental as it builds on existing risk estimation frameworks.

The paper tackles the problem of learning binary classifiers from unlabeled data pairs with similarity confidence, proposing an unbiased risk estimator that achieves optimal convergence rate and includes a correction scheme to prevent overfitting.

Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data. In this paper, we investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data, where we aim to learn an effective binary classifier from only unlabeled data pairs equipped with confidence that illustrates their degree of similarity (two examples are similar if they belong to the same class). To solve this problem, we propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate. To alleviate potential overfitting when flexible models are used, we further employ a risk correction scheme on the proposed risk estimator. Experimental results demonstrate the effectiveness of the proposed methods.

View on arXiv PDF

Similar