DS MLMay 18

On efficient robust regression with subquadratic samples

Deeksha Adil, Jarosław Błasiok, Hongjie Chen, Deepak Narayanan Sridharan

arXiv:2605.1804288.2

AI Analysis

For the robust linear regression problem, the paper provides an efficient algorithm with improved sample complexity and matching lower bounds, advancing theoretical understanding of trade-offs among sample size, condition number, and error.

The paper presents a near-linear-time algorithm for robust linear regression under Gaussian covariates with unknown covariance, achieving prediction error O(√(εκ)) using Õ(d/ε⁴) samples, improving over prior works. It also provides SQ and low-degree lower bounds showing fundamental sample complexity trade-offs.

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

View on arXiv PDF

Similar