LGCGMLJul 8, 2019

The Power of Comparisons for Actively Learning Linear Classifiers

arXiv:1907.03816v227 citations
Originality Highly original
AI Analysis

This addresses the challenge of costly labeling in big data for machine learning practitioners by providing a method to significantly cut down on labeled data needs, though it is incremental as it builds on existing active learning frameworks with new query types.

The paper tackles the problem of reducing labeled sample complexity in active learning for linear classifiers by introducing comparison queries and weak distributional assumptions, showing that this approach requires exponentially fewer samples than supervised alternatives and also works effectively in the Reliable and Probably Useful (RPU) learning model with at most logarithmic additional cost.

In the world of big data, large but costly to label datasets dominate many fields. Active learning, a semi-supervised alternative to the standard PAC-learning model, was introduced to explore whether adaptive labeling could learn concepts with exponentially fewer labeled samples. While previous results show that active learning performs no better than its supervised alternative for important concept classes such as linear separators, we show that by adding weak distributional assumptions and allowing comparison queries, active learning requires exponentially fewer samples. Further, we show that these results hold as well for a stronger model of learning called Reliable and Probably Useful (RPU) learning. In this model, our learner is not allowed to make mistakes, but may instead answer "I don't know." While previous negative results showed this model to have intractably large sample complexity for label queries, we show that comparison queries make RPU-learning at worst logarithmically more expensive in both the passive and active regimes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes