ML LG APAug 14, 2020

Provable More Data Hurt in High Dimensional Least Squares Estimator

arXiv:2008.06296v16.76 citations

Originality Incremental advance

AI Analysis

This addresses a counterintuitive issue in high-dimensional statistics, where adding data can degrade model performance, which is incremental but important for theoretical understanding.

The paper investigates the finite-sample prediction risk of high-dimensional least squares estimators, deriving a central limit theorem and showing that prediction risk can increase with more data, confirming a 'more data hurt' phenomenon.

This paper investigates the finite-sample prediction risk of the high-dimensional least squares estimator. We derive the central limit theorem for the prediction risk when both the sample size and the number of features tend to infinity. Furthermore, the finite-sample distribution and the confidence interval of the prediction risk are provided. Our theoretical results demonstrate the sample-wise nonmonotonicity of the prediction risk and confirm "more data hurt" phenomenon.

View on arXiv PDF

Similar