Comparison of parallel SMC and MCMC for Bayesian deep learning
This is an incremental study for researchers in Bayesian deep learning, providing a systematic comparison of parallel algorithms to improve computational efficiency.
This work compares parallel implementations of sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) for Bayesian deep learning, showing that both achieve similar performance to non-parallel versions on datasets like MNIST, CIFAR, and IMDb, but still require large wall-clock times and risk catastrophic non-convergence if run insufficiently.
This work systematically compares parallel implementations of consistent (asymptotically unbiased) Bayesian deep learning algorithms: sequential Monte Carlo sampler (SMC$_\parallel$) or Markov chain Monte Carlo (MCMC$_\parallel$). We provide a proof of convergence for SMC$_\parallel$ showing that it theoretically achieves the same level of convergence as a single monolithic SMC sampler, while the reduced communication lowers wall-clock time. It is well-known that the first samples from MCMC need to be discarded to eliminate initialization bias, and that the number of discarded samples must grow like the logarithm of the number of parallel chains to control that bias for MCMC$_\parallel$. A systematic empirical numerical study on MNIST, CIFAR, and IMDb, reveals that parallel implementations of both methods perform comparably to non-parallel implementations in terms of performance and total cost, and also comparably to each other. However, both methods still require a large wall-clock time, and suffer from catastrophic non-convergence if they aren't run for long enough.