Stein Variational Gradient Descent with Multiple Kernel
This work addresses a specific bottleneck in approximate inference for complex distributions, offering an incremental improvement over existing SVGD variants.
The paper tackled the sub-optimal performance of Stein variational gradient descent (SVGD) methods due to reliance on a single kernel, by proposing a multiple kernel approach that approximates the optimal kernel through a weighted combination. The result is a method called Multiple Kernel SVGD (MK-SVGD), which consistently matches or outperforms competing methods in experiments across various tasks and models.
Stein variational gradient descent (SVGD) and its variants have shown promising successes in approximate inference for complex distributions. In practice, we notice that the kernel used in SVGD-based methods has a decisive effect on the empirical performance. Radial basis function (RBF) kernel with median heuristics is a common choice in previous approaches, but unfortunately this has proven to be sub-optimal. Inspired by the paradigm of Multiple Kernel Learning (MKL), our solution to this flaw is using a combination of multiple kernels to approximate the optimal kernel, rather than a single one which may limit the performance and flexibility. Specifically, we first extend Kernelized Stein Discrepancy (KSD) to its multiple kernels view called Multiple Kernelized Stein Discrepancy (MKSD) and then leverage MKSD to construct a general algorithm Multiple Kernel SVGD (MK-SVGD). Further, MKSVGD can automatically assign a weight to each kernel without any other parameters, which means that our method not only gets rid of optimal kernel dependence but also maintains computational efficiency. Experiments on various tasks and models demonstrate that our proposed method consistently matches or outperforms the competing methods.