Extensions of stability selection using subsamples of observations and covariates
This work provides incremental improvements to variable selection methods for statisticians and data scientists, enhancing stability and performance in high-dimensional data analysis.
The paper tackles the problem of stabilizing variable selection methods by extending stability selection to use random subsamples of observations and covariates, generalizing theoretical results to arbitrary subsample sizes and validating improvements through numerical experiments on synthetic and real datasets.
We introduce extensions of stability selection, a method to stabilise variable selection methods introduced by Meinshausen and Bühlmann (J R Stat Soc 72:417-473, 2010). We propose to apply a base selection method repeatedly to random observation subsamples and covariate subsets under scrutiny, and to select covariates based on their selection frequency. We analyse the effects and benefits of these extensions. Our analysis generalizes the theoretical results of Meinshausen and Bühlmann (J R Stat Soc 72:417-473, 2010) from the case of half-samples to subsamples of arbitrary size. We study, in a theoretical manner, the effect of taking random covariate subsets using a simplified score model. Finally we validate these extensions on numerical experiments on both synthetic and real datasets, and compare the obtained results in detail to the original stability selection method.