LG MLFeb 3, 2020

Learning from Noisy Similar and Dissimilar Data

arXiv:2002.00995v15.09 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of using weak supervision for classification in scenarios where standard labeled data is hard to obtain, particularly under label noise, but it is incremental as it builds on existing pairwise supervision methods.

The paper tackles the problem of learning a classifier from noisy pairwise similarity and dissimilarity labels, which are common in privacy-sensitive domains, and demonstrates that their noise-informed algorithms outperform noise-blind baselines on synthetic and real-world datasets.

With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise---in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although this problem has been looked at recently, it is unclear how to learn from such supervision under label noise, which is very common when the supervision is crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled data. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D data. We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data. Finally, we perform experiments on synthetic and real world datasets and show our noise-informed algorithms outperform noise-blind baselines in learning from noisy pairwise data.

View on arXiv PDF

Similar