Kernel Stein Discrepancy Descent
This addresses sampling challenges in machine learning and statistics, offering a novel particle-based approach, though it is incremental with identified limitations.
The paper tackles the problem of sampling from a target probability distribution known up to a normalization constant by proposing KSD Descent, a deterministic score-based method using particles, which leverages robust optimization schemes like L-BFGS but can get stuck in spurious local minima.
Among dissimilarities between probability distributions, the Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $π$ on $\mathbb{R}^d$, known up to a normalization constant. This leads to a straightforwardly implementable, deterministic score-based method to sample from $π$, named KSD Descent, which uses a set of particles to approximate $π$. Remarkably, owing to a tractable loss function, KSD Descent can leverage robust parameter-free optimization schemes such as L-BFGS; this contrasts with other popular particle-based schemes such as the Stein Variational Gradient Descent algorithm. We study the convergence properties of KSD Descent and demonstrate its practical relevance. However, we also highlight failure cases by showing that the algorithm can get stuck in spurious local minima.