Scalable Spike-and-Slab
This work addresses a computational bottleneck for researchers and practitioners using Bayesian variable selection in high-dimensional settings, representing an incremental improvement over existing methods.
The paper tackles the high computational cost of existing samplers for spike-and-slab priors in high-dimensional Bayesian regression by proposing Scalable Spike-and-Slab (S^3), a scalable Gibbs sampling implementation that reduces per-iteration cost from order n^2 p to order max{n^2 p_t, np}, achieving orders of magnitude speed-ups over exact samplers and significant gains in inferential quality over approximate samplers.
Spike-and-slab priors are commonly used for Bayesian variable selection, due to their interpretability and favorable statistical properties. However, existing samplers for spike-and-slab posteriors incur prohibitive computational costs when the number of variables is large. In this article, we propose Scalable Spike-and-Slab ($S^3$), a scalable Gibbs sampling implementation for high-dimensional Bayesian regression with the continuous spike-and-slab prior of George and McCulloch (1993). For a dataset with $n$ observations and $p$ covariates, $S^3$ has order $\max\{ n^2 p_t, np \}$ computational cost at iteration $t$ where $p_t$ never exceeds the number of covariates switching spike-and-slab states between iterations $t$ and $t-1$ of the Markov chain. This improves upon the order $n^2 p$ per-iteration cost of state-of-the-art implementations as, typically, $p_t$ is substantially smaller than $p$. We apply $S^3$ on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.