Preconditioned Regularized Wasserstein Proximal Sampling
This work addresses sampling challenges in machine learning, particularly for Bayesian inference, but is incremental as it builds on a recently proposed method with modifications for improved efficiency.
The authors tackled the problem of sampling from Gibbs distributions by proposing a preconditioned, noise-free method that approximates the score function using a regularized Wasserstein proximal operator, resulting in accelerated and stable performance across various examples, including Bayesian image deconvolution and neural network training.
We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.