61.1MLMay 14
Training-Free Generative Sampling via Moment-Matched Score SmoothingZhenyu Yao, Daniel Paulin
Diffusion models generate samples by denoising along the score of a perturbed target distribution. In practice, one trains a neural diffusion model, which is computationally expensive. Recent work suggests that score matching implicitly smooths the empirical score, and that this smoothing bias promotes generalization by capturing low-dimensional data geometry. We propose moment-matched score-smoothed overdamped Langevin dynamics (MM-SOLD), a training-free interacting particle sampler that enforces the target moments throughout the sampling trajectory. We prove that, in the large-particle limit, the empirical particle density converges to a deterministic limit whose one-particle stationary marginal is a Gibbs--Boltzmann density obtained by exponentially tilting a naive score-smoothed diffusion target. The mean and covariance of this distribution agree with the empirical moments of the training data. Experiments on 2D distributions and latent-space image generation show that MM-SOLD enables fast, robust, training-free sampling on CPUs, with sample fidelity and diversity competitive with neural diffusion baselines.
MLFeb 6
Infinite-dimensional generative diffusions via Doob's h-transformThorben Pieper-Sethmacher, Daniel Paulin
This paper introduces a rigorous framework for defining generative diffusion models in infinite dimensions via Doob's h-transform. Rather than relying on time reversal of a noising process, a reference diffusion is forced towards the target distribution by an exponential change of measure. Compared to existing methodology, this approach readily generalises to the infinite-dimensional setting, hence offering greater flexibility in the diffusion model. The construction is derived rigorously under verifiable conditions, and bounds with respect to the target measure are established. We show that the forced process under the changed measure can be approximated by minimising a score-matching objective and validate our method on both synthetic and real data.
85.4COApr 27
Theoretical guarantees for stochastic gradient sampling methods via Gaussian convolution inequalitiesDaniel Paulin, Peter A. Whalley
We derive first-order (in the stepsize) bounds on the bias in Wasserstein distances of the invariant measure of stochastic gradient kinetic Langevin dynamics with minimal assumptions on the stochastic gradient noise. These bounds sharpen existing non-asymptotic guarantees for stochastic-gradient MCMC methods and provide a quantitative resolution of a previously open problem on invariant measure accuracy. The main technical ingredients are new Gaussian convolution inequalities controlling the Wasserstein-$p$ distance between a Gaussian convolved with a mean-zero perturbation and the Gaussian itself. We anticipate that these inequalities will be of independent interest beyond the present application.
MLOct 14, 2024
Sampling from Bayesian Neural Network Posteriors with Symmetric Minibatch Splitting Langevin DynamicsDaniel Paulin, Peter A. Whalley, Neil K. Chada et al.
We propose a scalable kinetic Langevin dynamics algorithm for sampling parameter spaces of big data and AI applications. Our scheme combines a symmetric forward/backward sweep over minibatches with a symmetric discretization of Langevin dynamics. For a particular Langevin splitting method (UBU), we show that the resulting Symmetric Minibatch Splitting-UBU (SMS-UBU) integrator has bias $O(h^2 d^{1/2})$ in dimension $d>0$ with stepsize $h>0$, despite only using one minibatch per iteration, thus providing excellent control of the sampling bias as a function of the stepsize. We apply the algorithm to explore local modes of the posterior distribution of Bayesian neural networks (BNNs) and evaluate the calibration performance of the posterior predictive probabilities for neural networks with convolutional neural network architectures for classification problems on three different datasets (Fashion-MNIST, Celeb-A and chest X-ray). Our results indicate that BNNs sampled with SMS-UBU can offer significantly better calibration performance compared to standard methods of training and stochastic weight averaging.
MLFeb 13, 2024
Correction to "Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations"Daniel Paulin, Peter A. Whalley
A method for analyzing non-asymptotic guarantees of numerical discretizations of ergodic SDEs in Wasserstein-2 distance is presented by Sanz-Serna and Zygalakis in ``Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations". They analyze the UBU integrator which is strong order two and only requires one gradient evaluation per step, resulting in desirable non-asymptotic guarantees, in particular $\mathcal{O}(d^{1/4}ε^{-1/2})$ steps to reach a distance of $ε> 0$ in Wasserstein-2 distance away from the target distribution. However, there is a mistake in the local error estimates in Sanz-Serna and Zygalakis (2021), in particular, a stronger assumption is needed to achieve these complexity estimates. This note reconciles the theory with the dimension dependence observed in practice in many applications of interest.
MLSep 26, 2025
A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process PriorTrinnhallen Brisley, Gordon Ross, Daniel Paulin
Hawkes process models are used in settings where past events increase the likelihood of future events occurring. Many applications record events as counts on a regular grid, yet discrete-time Hawkes models remain comparatively underused and are often constrained by fixed-form baselines and excitation kernels. In particular, there is a lack of flexible, nonparametric treatments of both the baseline and the excitation in discrete time. To this end, we propose the Gaussian Process Discrete Hawkes Process (GP-DHP), a nonparametric framework that places Gaussian process priors on both the baseline and the excitation and performs inference through a collapsed latent representation. This yields smooth, data-adaptive structure without prespecifying trends, periodicities, or decay shapes, and enables maximum a posteriori (MAP) estimation with near-linear-time \(O(T\log T)\) complexity. A closed-form projection recovers interpretable baseline and excitation functions from the optimized latent trajectory. In simulations, GP-DHP recovers diverse excitation shapes and evolving baselines. In case studies on U.S. terrorism incidents and weekly Cryptosporidiosis counts, it improves test predictive log-likelihood over standard parametric discrete Hawkes baselines while capturing bursts, delays, and seasonal background variation. The results indicate that flexible discrete-time self-excitation can be achieved without sacrificing scalability or interpretability.
MLDec 1, 2021
On Mixing Times of Metropolized Algorithm With Optimization Step (MAO) : A New FrameworkEL Mahdi Khribch, George Deligiannidis, Daniel Paulin
In this paper, we consider sampling from a class of distributions with thin tails supported on $\mathbb{R}^d$ and make two primary contributions. First, we propose a new Metropolized Algorithm With Optimization Step (MAO), which is well suited for such targets. Our algorithm is capable of sampling from distributions where the Metropolis-adjusted Langevin algorithm (MALA) is not converging or lacking in theoretical guarantees. Second, we derive upper bounds on the mixing time of MAO. Our results are supported by simulations on multiple target distributions.
COMay 23, 2019
Efficient MCMC Sampling with Dimension-Free Convergence Rate using ADMM-type SplittingMaxime Vono, Daniel Paulin, Arnaud Doucet
Performing exact Bayesian inference for complex models is computationally intractable. Markov chain Monte Carlo (MCMC) algorithms can provide reliable approximations of the posterior distribution but are expensive for large datasets and high-dimensional models. A standard approach to mitigate this complexity consists in using subsampling techniques or distributing the data across a cluster. However, these approaches are typically unreliable in high-dimensional scenarios. We focus here on a recent alternative class of MCMC schemes exploiting a splitting strategy akin to the one used by the celebrated alternating direction of multipliers (ADMM) optimization algorithm. These methods appear to provide empirically state-of-the-art performance but their theoretical behavior in high dimension is currently unknown. In this paper, we propose a detailed theoretical study of one of these algorithms known as the split Gibbs sampler. Under regularity conditions, we establish explicit convergence rates for this scheme using Ricci curvature and coupling ideas. We support our theory with numerical illustrations.
OCSep 13, 2018
Hamiltonian Descent MethodsChris J. Maddison, Daniel Paulin, Yee Whye Teh et al.
We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger class includes functions whose second derivatives may be singular or unbounded at their minima. Our methods are discretizations of conformal Hamiltonian dynamics, which generalize the classical momentum method to model the motion of a particle with non-standard kinetic energy exposed to a dissipative force and the gradient field of the function of interest. They are first-order in the sense that they require only gradient computation. Yet, crucially the kinetic gradient map can be designed to incorporate information about the convex conjugate in a fashion that allows for linear convergence on convex functions that may be non-smooth or non-strongly convex. We study in detail one implicit and two explicit methods. For one explicit method, we provide conditions under which it converges to stationary points of non-convex functions. For all, we provide conditions on the convex function and kinetic energy pair that guarantee linear convergence, and show that these conditions can be satisfied by functions with power growth. In sum, these methods expand the class of convex functions on which linear convergence is possible with first-order computation.
SYApr 17, 2015
Probabilistic verification of partially observable dynamical systemsBenjamin M. Gyori, Daniel Paulin, Sucheendra K. Palaniappan
The construction and formal verification of dynamical models is important in engineering, biology and other disciplines. We focus on non-linear models containing a set of parameters governing their dynamics. The value of these parameters is often unknown and not directly observable through measurements, which are themselves noisy. When treating parameters as random variables, one can constrain their distribution by conditioning on observations and thereby constructing a posterior probability distribution. We aim to perform model verification with respect to this posterior. The main difficulty in performing verification on a model under the posterior distribution is that in general, it is difficult to obtain \emph{independent} samples from the posterior, especially for non-linear dynamical models. Standard statistical model checking methods require independent realizations of the system and are therefore not applicable in this context. We propose a Markov chain Monte Carlo based statistical model checking framework, which produces a sequence of dependent random realizations of the model dynamics over the parameter posterior. Using this sequence of samples, we use statistical hypothesis tests to verify whether the model satisfies a bounded temporal logic property with a certain probability. We use sample size bounds tailored to the setting of dependent samples for fixed sample size and sequential tests. We apply our method to a case-study from the domain of systems biology, to a model of the JAK-STAT biochemical pathway. The pathway is modeled as a system of non-linear ODEs containing a set of unknown parameters. Noisy, indirect observations of the system state are available from an experiment. The results show that the proposed method enables probabilistic verification with respect to the parameter posterior with specified error bounds.