STLGMLSep 19, 2020

Suboptimality of Constrained Least Squares and Improvements via Non-Linear Predictors

arXiv:2009.09304v217 citations
AI Analysis

This work addresses a theoretical gap in statistical learning for linear prediction under boundedness, with implications for algorithm design in robust statistics.

The paper demonstrates that the constrained least squares estimator fails to achieve the optimal O(d/n) excess risk rate for bounded distributions, showing a lower bound of Ω(d^{3/2}/n) and refuting a prior conjecture, while noting that non-linear predictors can achieve the optimal rate without distributional assumptions.

We study the problem of predicting as well as the best linear predictor in a bounded Euclidean ball with respect to the squared loss. When only boundedness of the data generating distribution is assumed, we establish that the least squares estimator constrained to a bounded Euclidean ball does not attain the classical $O(d/n)$ excess risk rate, where $d$ is the dimension of the covariates and $n$ is the number of samples. In particular, we construct a bounded distribution such that the constrained least squares estimator incurs an excess risk of order $Ω(d^{3/2}/n)$ hence refuting a recent conjecture of Ohad Shamir [JMLR 2015]. In contrast, we observe that non-linear predictors can achieve the optimal rate $O(d/n)$ with no assumptions on the distribution of the covariates. We discuss additional distributional assumptions sufficient to guarantee an $O(d/n)$ excess risk rate for the least squares estimator. Among them are certain moment equivalence assumptions often used in the robust statistics literature. While such assumptions are central in the analysis of unbounded and heavy-tailed settings, our work indicates that in some cases, they also rule out unfavorable bounded distributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes