Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory
This work addresses a theoretical limitation in particle-optimization sampling methods, which is important for researchers in Bayesian inference and machine learning, though it is incremental as it builds on existing SVGD frameworks.
The paper identifies a theoretical pitfall in Stein variational gradient descent (SVGD) where particles collapse, and proposes a stochastic variant called SPOS that injects random noise to address this. It develops non-asymptotic convergence theory for SPOS, showing that with a fixed computational budget, using more particles does not always improve approximation accuracy, as verified in experiments.
Particle-optimization-based sampling (POS) is a recently developed effective sampling technique that interactively updates a set of particles. A representative algorithm is the Stein variational gradient descent (SVGD). We prove, under certain conditions, SVGD experiences a theoretical pitfall, {\it i.e.}, particles tend to collapse. As a remedy, we generalize POS to a stochastic setting by injecting random noise into particle updates, thus yielding particle-optimization sampling (SPOS). Notably, for the first time, we develop {\em non-asymptotic convergence theory} for the SPOS framework (related to SVGD), characterizing algorithm convergence in terms of the 1-Wasserstein distance w.r.t.\! the numbers of particles and iterations. Somewhat surprisingly, with the same number of updates (not too large) for each particle, our theory suggests adopting more particles does not necessarily lead to a better approximation of a target distribution, due to limited computational budget and numerical errors. This phenomenon is also observed in SVGD and verified via an experiment on synthetic data. Extensive experimental results verify our theory and demonstrate the effectiveness of our proposed framework.