A Competitive Algorithm for Agnostic Active Learning
This addresses the inefficiency of prior algorithms in agnostic active learning, providing a broadly applicable solution with proven hardness results, though it builds incrementally on existing splitting-based approaches.
The paper tackles the problem of agnostic active learning by developing an algorithm that is competitive with the optimal algorithm for any binary hypothesis class and distribution, achieving O(η) error with O(m* log |H|) queries if the optimal uses m* queries.
For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m^*$ queries to get $O(η)$ error, then our algorithm uses $O(m^* \log |H|)$ queries to get $O(η)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($η= 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.