Bayesian Variable Selection in a Million Dimensions
This work addresses computational bottlenecks for researchers in fields like biology and economics, enabling Bayesian variable selection in large-scale datasets, though it is incremental as it builds on existing MCMC methods.
The authors tackled the computational challenge of Bayesian variable selection in high-dimensional settings by introducing an efficient MCMC scheme with sublinear cost per iteration, extending it to generalized linear models like binomial and negative binomial regression, and demonstrated effectiveness on genomic data.
Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.