ML LGFeb 9, 2016

Online Active Linear Regression via Thresholding

Carlos Riquelme, Ramesh Johari, Baosen Zhang

arXiv:1602.02845v410.823 citations

Originality Incremental advance

AI Analysis

This work addresses efficient data collection for regression modeling in resource-constrained settings, representing an incremental improvement with specific algorithmic contributions.

The paper tackles the problem of online active learning for linear regression with limited experimentation budgets by proposing a threshold-based algorithm for selecting informative observations, which simulations show significantly reduces both mean and variance of squared error compared to passive random sampling in real-world datasets with high nonlinearity and dimensionality.

We consider the problem of online active learning to collect data for regression modeling. Specifically, we consider a decision maker with a limited experimentation budget who must efficiently learn an underlying linear population model. Our main contribution is a novel threshold-based algorithm for selection of most informative observations; we characterize its performance and fundamental lower bounds. We extend the algorithm and its guarantees to sparse linear regression in high-dimensional settings. Simulations suggest the algorithm is remarkably robust: it provides significant benefits over passive random sampling in real-world datasets that exhibit high nonlinearity and high dimensionality --- significantly reducing both the mean and variance of the squared error.

View on arXiv PDF

Similar