LG MLApr 5, 2014

A Compression Technique for Analyzing Disagreement-Based Active Learning

Yair Wiener, Steve Hanneke, Ran El-Yaniv

arXiv:1404.1504v121 citations

Originality Incremental advance

AI Analysis

This work provides incremental improvements in theoretical understanding for researchers in active learning, with specific applications to certain data distributions.

The paper tackles the problem of analyzing label complexity in disagreement-based active learning by introducing a new characterization based on version space compression set size, resulting in tight analyses for CAL and refined bounds for linear separators and axis-aligned rectangles.

We introduce a new and improved characterization of the label complexity of disagreement-based active learning, in which the leading quantity is the version space compression set size. This quantity is defined as the size of the smallest subset of the training data that induces the same version space. We show various applications of the new characterization, including a tight analysis of CAL and refined label complexity bounds for linear separators under mixtures of Gaussians and axis-aligned rectangles under product densities. The version space compression set size, as well as the new characterization of the label complexity, can be naturally extended to agnostic learning problems, for which we show new speedup results for two well known active learning algorithms.

View on arXiv PDF

Similar