MSLGCOMLMar 20, 2025

Efficiently Vectorized MCMC on Modern Accelerators

arXiv:2503.17405v22 citationsh-index: 2ICML
Originality Incremental advance
AI Analysis

This addresses inefficiencies in parallel MCMC execution for practitioners using modern accelerators, though it is incremental as it builds on existing vectorization tools.

The paper tackled the synchronization problem in vectorized multi-chain MCMC algorithms by designing single-chain algorithms using finite state machines (FSMs) to avoid overheads, resulting in speed-ups of up to an order of magnitude for methods like Elliptical Slice Sampling and HMC-NUTS.

With the advent of automatic vectorization tools (e.g., JAX's $\texttt{vmap}$), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem -- loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes