Adaptive MCMC via Combining Local Samplers
This addresses the challenge of efficient sampling in machine learning for multimodal distributions, offering a novel combination approach that is incremental but improves performance in specific domains.
The paper tackles the problem of designing fast-mixing Markov chain Monte Carlo (MCMC) chains by combining samples from multiple parallel chains that explore local regions, using kernel Stein discrepancy for prioritization and a novel technique for estimating region probabilities. Experimental results show the method provides significant speedups, remaining competitive with NUTS on unimodal distributions and outperforming state-of-the-art competitors on multimodal problems and a sensor localization task.
Markov chain Monte Carlo (MCMC) methods are widely used in machine learning. One of the major problems with MCMC is the question of how to design chains that mix fast over the whole state space; in particular, how to select the parameters of an MCMC algorithm. Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e.g., a few modes only). The chains are prioritized based on kernel Stein discrepancy, which provides a good measure of performance locally. The samples from the independent chains are combined using a novel technique for estimating the probability of different regions of the sample space. Experimental results demonstrate that the proposed algorithm may provide significant speedups in different sampling problems. Most importantly, when combined with the state-of-the-art NUTS algorithm as the base MCMC sampler, our method remained competitive with NUTS on sampling from unimodal distributions, while significantly outperforming state-of-the-art competitors on synthetic multimodal problems as well as on a challenging sensor localization task.