MLJun 13, 2024
Operator-Informed Score Matching for Markov Diffusion ModelsZheyang Shen, Huihui Wang, Marina Riabiz et al.
Diffusion models are typically trained using score matching, a learning objective agnostic to the underlying noising process that guides the model. This paper argues that Markov noising processes enjoy an advantage over alternatives, as the Markov operators that govern the noising process are well-understood. Specifically, by leveraging the spectral decomposition of the infinitesimal generator of the Markov noising process, we obtain parametric estimates of the score functions simultaneously for all marginal distributions, using only sample averages with respect to the data distribution. The resulting operator-informed score matching provides both a standalone approach to sample generation for low-dimensional distributions, as well as a recipe for better informed neural score estimators in high-dimensional settings.
MLOct 7, 2021
De-randomizing MCMC dynamics with the diffusion Stein operatorZheyang Shen, Markus Heinonen, Samuel Kaski
Approximate Bayesian inference estimates descriptors of an intractable target distribution - in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the Wasserstein space. Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process. We propose de-randomized kernel-based particle samplers to all diffusion-based samplers known as MCMC dynamics. Following previous work in interpreting MCMC dynamics, we equip the Stein-Wasserstein space with a fiber-Riemannian Poisson structure, with the capacity of characterizing a fiber-gradient Hamiltonian flow that simulates MCMC dynamics. Such dynamics discretizes into generalized SVGD (GSVGD), a Stein-type deterministic particle sampler, with particle updates coinciding with applying the diffusion Stein operator to a kernel function. We demonstrate empirically that GSVGD can de-randomize complex MCMC dynamics, which combine the advantages of auxiliary momentum variables and Riemannian structure, while maintaining the high sample quality from an interacting particle system.
MLMar 6, 2020
Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable ApproximationsSimone Rossi, Markus Heinonen, Edwin V. Bonilla et al.
Variational inference techniques based on inducing variables provide an elegant framework for scalable posterior estimation in Gaussian process (GP) models. Besides enabling scalability, one of their main advantages over sparse approximations using direct marginal likelihood maximization is that they provide a robust alternative for point estimation of the inducing inputs, i.e. the location of the inducing variables. In this work we challenge the common wisdom that optimizing the inducing inputs in the variational framework yields optimal performance. We show that, by revisiting old model approximations such as the fully-independent training conditionals endowed with powerful sampling-based inference methods, treating both inducing locations and GP hyper-parameters in a Bayesian way can improve performance significantly. Based on stochastic gradient Hamiltonian Monte Carlo, we develop a fully Bayesian approach to scalable GP and deep GP models, and demonstrate its state-of-the-art performance through an extensive experimental campaign across several regression and classification problems.
MLMay 23, 2019
Learning spectrograms with convolutional spectral kernelsZheyang Shen, Markus Heinonen, Samuel Kaski
We introduce the convolutional spectral kernel (CSK), a novel family of non-stationary, nonparametric covariance kernels for Gaussian process (GP) models, derived from the convolution between two imaginary radial basis functions. We present a principled framework to interpret CSK, as well as other deep probabilistic models, using approximated Fourier transform, yielding a concise representation of input-frequency spectrogram. Observing through the lens of the spectrogram, we provide insight on the interpretability of deep models. We then infer the functional hyperparameters using scalable variational and MCMC methods. On small- and medium-sized spatiotemporal datasets, we demonstrate improved generalization of GP models when equipped with CSK, and their capability to extract non-stationary periodic patterns.
MLOct 10, 2018
Harmonizable mixture kernels with variational Fourier featuresZheyang Shen, Markus Heinonen, Samuel Kaski
The expressive power of Gaussian processes depends heavily on the choice of kernel. In this work we propose the novel harmonizable mixture kernel (HMK), a family of expressive, interpretable, non-stationary kernels derived from mixture models on the generalized spectral representation. As a theoretically sound treatment of non-stationary kernels, HMK supports harmonizable covariances, a wide subset of kernels including all stationary and many non-stationary covariances. We also propose variational Fourier features, an inter-domain sparse GP inference framework that offers a representative set of 'inducing frequencies'. We show that harmonizable mixture kernels interpolate between local patterns, and that variational Fourier features offers a robust kernel learning framework for the new kernel family.