ML LGOct 22, 2014

Active Regression by Stratification

arXiv:1410.5920v138 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient data collection in regression for machine learning practitioners, offering a novel active learning approach that is incremental in improving convergence constants rather than rates.

The paper tackles active learning for parametric linear regression with random design, providing finite sample convergence guarantees for misspecified models and showing that their algorithm can improve over passive learning by optimizing the distribution-dependent risk constant.

We propose a new active learning algorithm for parametric linear regression with random design. We provide finite sample convergence guarantees for general distributions in the misspecified model. This is the first active learner for this setting that provably can improve over passive learning. Unlike other learning settings (such as classification), in regression the passive learning rate of $O(1/ε)$ cannot in general be improved upon. Nonetheless, the so-called `constant' in the rate of convergence, which is characterized by a distribution-dependent risk, can be improved in many cases. For a given distribution, achieving the optimal risk requires prior knowledge of the distribution. Following the stratification technique advocated in Monte-Carlo function integration, our active learner approaches the optimal risk using piecewise constant approximations.

View on arXiv PDF

Similar