LGNov 5, 2020

Binary classification with ambiguous training data

Naoya Otani, Yosuke Otsubo, Tetsuya Koike, Masashi Sugiyama

arXiv:2011.02598v18 citations

AI Analysis

This addresses a specific problem in supervised learning for scenarios where domain experts struggle with labeling, but it is incremental as it builds on existing reject option frameworks.

The paper tackles binary classification when training data includes ambiguous samples that are hard to label, proposing a method that extends binary classification with a reject option by incorporating these ambiguous samples into a new loss function. Numerical experiments show the method successfully utilizes ambiguous training data, though no concrete performance numbers are provided.

In supervised learning, we often face with ambiguous (A) samples that are difficult to label even by domain experts. In this paper, we consider a binary classification problem in the presence of such A samples. This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples. Also, it is different from 3-class classification with the positive (P), negative (N), and A classes since we do not want to classify test samples into the A class. Our proposed method extends binary classification with reject option, which trains a classifier and a rejector simultaneously using P and N samples based on the 0-1-$c$ loss with rejection cost $c$. More specifically, we propose to train a classifier and a rejector under the 0-1-$c$-$d$ loss using P, N, and A samples, where $d$ is the misclassification penalty for ambiguous samples. In our practical implementation, we use a convex upper bound of the 0-1-$c$-$d$ loss for computational tractability. Numerical experiments demonstrate that our method can successfully utilize the additional information brought by such A training data.

View on arXiv PDF

Similar