ML LG CO MEAug 1, 2023

Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique

Junxian Zhu, Jin Zhu, Borui Tang, Xuanyu Chen, Hongmei Lin, Xueqin Wang

arXiv:2308.00251v17.45 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of sparse model identification for statisticians and data scientists, offering an incremental improvement in speed and consistency for variable selection.

The authors tackled the challenge of best-subset selection in high-dimensional generalized linear models by developing a fast algorithm that ensures computational efficiency and statistical guarantees, achieving approximately a fourfold speedup over existing methods like glmnet and ncvreg.

In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type, achieving either computational efficiency or statistical guarantees is challenging. In this article, we intend to surmount this obstacle by utilizing a fast algorithm to select the best subset with high certainty. We proposed and illustrated an algorithm for best subset recovery in regularity conditions. Under mild conditions, the computational complexity of our algorithm scales polynomially with sample size and dimension. In addition to demonstrating the statistical properties of our method, extensive numerical experiments reveal that it outperforms existing methods for variable selection and coefficient estimation. The runtime analysis shows that our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits like glmnet and ncvreg.

View on arXiv PDF

Similar