MLDIS-NNMEFeb 28, 2018

Semi-Analytic Resampling in Lasso

arXiv:1802.10254v29 citations
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks for researchers and practitioners using Lasso-based variable selection methods, offering an incremental improvement in speed and stability.

The paper tackles the computational cost of resampling in Lasso-based variable selection methods like Bolasso and stability selection by developing a semi-analytic method that directly computes averages over resampled datasets without repeated sampling, reducing statistical fluctuations and significantly cutting computational time, with numerical experiments showing efficiency gains.

An approximate method for conducting resampling in Lasso, the $\ell_1$ penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The proposed method is based on a message passing type algorithm, and its fast convergence is guaranteed by the state evolution analysis, when covariates are provided as zero-mean independently and identically distributed Gaussian random variables. It is employed to implement bootstrapped Lasso (Bolasso) and stability selection, both of which are variable selection methods using resampling in conjunction with Lasso, and resolves their disadvantage regarding computational cost. To examine approximation accuracy and efficiency, numerical experiments were carried out using simulated datasets. Moreover, an application to a real-world dataset, the wine quality dataset, is presented. To process such real-world datasets, an objective criterion for determining the relevance of selected variables is also introduced by the addition of noise variables and resampling.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes