LGMLMay 19, 2021

L1 Regression with Lewis Weights Subsampling

arXiv:2105.09433v123 citations
Originality Incremental advance
AI Analysis

This addresses efficient data sampling for robust regression in machine learning, offering exponential improvement in δ dependence compared to prior methods, though it is an incremental advance in subsampling techniques.

The paper tackles the problem of approximating L1 regression with limited label observations by subsampling rows using Lewis weights, achieving a solution within a 1+ε factor of optimal with high probability for m > O(1/ε² d log(d/(εδ))), and provides a matching lower bound.

We consider the problem of finding an approximate solution to $\ell_1$ regression while only observing a small number of labels. Given an $n \times d$ unlabeled data matrix $X$, we must choose a small set of $m \ll n$ rows to observe the labels of, then output an estimate $\widehatβ$ whose error on the original problem is within a $1 + \varepsilon$ factor of optimal. We show that sampling from $X$ according to its Lewis weights and outputting the empirical minimizer succeeds with probability $1-δ$ for $m > O(\frac{1}{\varepsilon^2} d \log \frac{d}{\varepsilon δ})$. This is analogous to the performance of sampling according to leverage scores for $\ell_2$ regression, but with exponentially better dependence on $δ$. We also give a corresponding lower bound of $Ω(\frac{d}{\varepsilon^2} + (d + \frac{1}{\varepsilon^2}) \log\frac{1}δ)$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes