STLGMLFeb 25, 2021

Distribution-Free Robust Linear Regression

arXiv:2102.12919v227 citations
AI Analysis

This addresses robust regression for heavy-tailed data without distributional assumptions, offering a novel estimator with strong guarantees, though it is incremental in building on existing methods like truncated least squares and median-of-means.

The paper tackles distribution-free linear regression with heavy-tailed responses, showing that bounded conditional second moments are necessary and sufficient for nontrivial guarantees, and constructs a non-linear estimator achieving an excess risk of order d/n with optimal sub-exponential tail.

We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Györfi, Kohler, Krzyżak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order $d/n$ with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes