MESTMLAug 2, 2016

Can we trust the bootstrap in high-dimension?

arXiv:1608.00696v127 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of statistical inference reliability for researchers and practitioners using bootstrap methods in high-dimensional settings, highlighting significant limitations and proposing incremental improvements.

The paper investigates the reliability of bootstrap methods for confidence intervals in high-dimensional linear regression, finding that both residual and pairs bootstrap perform poorly as the ratio of predictors to samples increases, with residual bootstrap leading to inflated Type I errors and pairs bootstrap causing severe loss of power. It proposes alternative procedures to mitigate these issues, though they depend on specific data assumptions, indicating challenges for universal bootstrapping in high dimensions.

We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where $p<n$ but $p/n$ is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of $β$? (where $β$ is the true regression vector). We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression -- residual bootstrap and pairs bootstrap -- give very poor inference on $β$ as the ratio $p/n$ grows. We find that the residuals bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio $p/n$ grows. We also show that the jackknife resampling technique for estimating the variance of $\hatβ$ severely overestimates the variance in high dimensions. We contribute alternative bootstrap procedures based on our theoretical results that mitigate these problems. However, the corrections depend on assumptions regarding the underlying data-generation model, suggesting that in high-dimensions it may be difficult to have universal, robust bootstrapping techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes