ST IT LG ME MLMay 5, 2024

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

arXiv:2405.03063v11.2

Originality Incremental advance

AI Analysis

This work addresses a computational bottleneck in variable selection for statisticians and data scientists, offering incremental improvements to resampling-based methods.

The paper tackles the problem of updating Lasso coefficients when a design matrix column changes, proposing an approximate formula for debiased Lasso updates with nonasymptotic error bounds and proving asymptotic convergence for random designs. It shows the formula is asymptotically correct in most coordinates under mild assumptions and reduces computational complexity for variable selection algorithms like the conditional randomization test and knockoff filter.

Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.

View on arXiv PDF

Similar