Reduced Label Complexity For Tight $\ell_2$ Regression
This work addresses the challenge of minimizing label acquisition costs in regression tasks, offering a method to achieve tight approximations with fewer labels, though it is incremental as it leaves open the possibility of further reduction.
The paper tackles the problem of reducing label complexity in tight ℓ₂ regression by developing a polynomial algorithm that, without using labels, discards a subset of data points to achieve a (1+d/n)-approximation in expectation, reducing label complexity by Ω(√n).
Given data ${\rm X}\in\mathbb{R}^{n\times d}$ and labels $\mathbf{y}\in\mathbb{R}^{n}$ the goal is find $\mathbf{w}\in\mathbb{R}^d$ to minimize $\Vert{\rm X}\mathbf{w}-\mathbf{y}\Vert^2$. We give a polynomial algorithm that, \emph{oblivious to $\mathbf{y}$}, throws out $n/(d+\sqrt{n})$ data points and is a $(1+d/n)$-approximation to optimal in expectation. The motivation is tight approximation with reduced label complexity (number of labels revealed). We reduce label complexity by $Ω(\sqrt{n})$. Open question: Can label complexity be reduced by $Ω(n)$ with tight $(1+d/n)$-approximation?