STLGPRMLSep 13, 2024

Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent

arXiv:2409.08469v315 citationsh-index: 17
Originality Incremental advance
AI Analysis

This provides theoretical guarantees for SVGD, a popular algorithm in machine learning for approximate Bayesian inference, though it is incremental as it builds on existing analysis.

The paper tackles the problem of analyzing finite-particle convergence rates for Stein Variational Gradient Descent (SVGD), achieving rates of order 1/√N in Kernelized Stein Discrepancy and Wasserstein-2 metrics, with a double exponential improvement over prior work and polynomial growth in dimension under mild assumptions.

We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to $N$ times the expected $\mathsf{KSD}^2$ and a smaller `positive part'. This observation leads to $\mathsf{KSD}$ rates of order $1/\sqrt{N}$, in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence in continuous time. For the case of `bilinear + Matérn' kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes