MLSep 10, 2022
Batch Bayesian Optimization via Particle Gradient FlowsEnrico Crovini, Simon L. Cotter, Konstantinos Zygalakis et al.
Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this `inner' optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.
COSep 23, 2024
Bayesian computation with generative diffusion models by Multilevel Monte CarloAbdul-Lateef Haji-Ali, Marcelo Pereyra, Luke Shaw et al.
Generative diffusion models have recently emerged as a powerful strategy to perform stochastic sampling in Bayesian inverse problems, delivering remarkably accurate solutions for a wide range of challenging applications. However, diffusion models often require a large number of neural function evaluations per sample in order to deliver accurate posterior samples. As a result, using diffusion models as stochastic samplers for Monte Carlo integration in Bayesian computation can be highly computationally expensive, particularly in applications that require a substantial number of Monte Carlo samples for conducting uncertainty quantification analyses. This cost is especially high in large-scale inverse problems such as computational imaging, which rely on large neural networks that are expensive to evaluate. With quantitative imaging applications in mind, this paper presents a Multilevel Monte Carlo strategy that significantly reduces the cost of Bayesian computation with diffusion models. This is achieved by exploiting cost-accuracy trade-offs inherent to diffusion models to carefully couple models of different levels of accuracy in a manner that significantly reduces the overall cost of the calculation, without reducing the final accuracy. The proposed approach achieves a $4\times$-to-$8\times$ reduction in computational cost w.r.t. standard techniques across three benchmark imaging problems.
LGFeb 24
Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior samplingJonathan Spence, Tobías I. Liaudat, Konstantinos Zygalakis et al.
Markov chain Monte Carlo (MCMC) methods are fundamental to Bayesian computation, but can be computationally intensive, especially in high-dimensional settings. Push-forward generative models, such as generative adversarial networks (GANs), variational auto-encoders and normalising flows offer a computationally efficient alternative for posterior sampling. However, push-forward models are opaque as they lack the modularity of Bayes Theorem, leading to poor generalisation with respect to changes in the likelihood function. In this work, we introduce a novel approach to GAN architecture design by applying deep unfolding to Langevin MCMC algorithms. This paradigm maps fixed-step iterative algorithms onto modular neural networks, yielding architectures that are both flexible and amenable to interpretation. Crucially, our design allows key model parameters to be specified at inference time, offering robustness to changes in the likelihood parameters. We train these unfolded samplers end-to-end using a supervised regularized Wasserstein GAN framework for posterior sampling. Through extensive Bayesian imaging experiments, we demonstrate that our proposed approach achieves high sampling accuracy and excellent computational efficiency, while retaining the physics consistency, adaptability and interpretability of classical MCMC strategies.
MESep 1, 2025
Sampling as Bandits: Evaluation-Efficient Design for Black-Box DensitiesTakuo Matsubara, Andrew Duncan, Simon Cotter et al.
We introduce bandit importance sampling (BIS), a new class of importance sampling methods designed for settings where the target density is expensive to evaluate. In contrast to adaptive importance sampling, which optimises a proposal distribution, BIS directly designs the samples through a sequential strategy that combines space-filling designs with multi-armed bandits. Our method leverages Gaussian process surrogates to guide sample selection, enabling efficient exploration of the parameter space with minimal target evaluations. We establish theoretical guarantees on convergence and demonstrate the effectiveness of the method across a broad range of sampling tasks. BIS delivers accurate approximations with fewer target evaluations, outperforming competing approaches across multimodal, heavy-tailed distributions, and real-world applications to Bayesian inference of computationally expensive models.
MLMay 28, 2025
Hypothesis Testing in Imaging Inverse ProblemsYiming Xi, Konstantinos Zygalakis, Marcelo Pereyra
This paper proposes a framework for semantic hypothesis testing tailored to imaging inverse problems. Modern imaging methods struggle to support hypothesis testing, a core component of the scientific method that is essential for the rigorous interpretation of experiments and robust interfacing with decision-making processes. There are three main reasons why image-based hypothesis testing is challenging. First, the difficulty of using a single observation to simultaneously reconstruct an image, formulate hypotheses, and quantify their statistical significance. Second, the hypotheses encountered in imaging are mostly of semantic nature, rather than quantitative statements about pixel values. Third, it is challenging to control test error probabilities because the null and alternative distributions are often unknown. Our proposed approach addresses these difficulties by leveraging concepts from self-supervised computational imaging, vision-language models, and non-parametric hypothesis testing with e-values. We demonstrate our proposed framework through numerical experiments related to image-based phenotyping, where we achieve excellent power while robustly controlling Type I errors.
MEJun 8, 2017
The True Cost of Stochastic Gradient Langevin DynamicsTigran Nagapetyan, Andrew B. Duncan, Leonard Hasenclever et al.
The problem of posterior inference is central to Bayesian statistics and a wealth of Markov Chain Monte Carlo (MCMC) methods have been proposed to obtain asymptotically correct samples from the posterior. As datasets in applications grow larger and larger, scalability has emerged as a central problem for MCMC methods. Stochastic Gradient Langevin Dynamics (SGLD) and related stochastic gradient Markov Chain Monte Carlo methods offer scalability by using stochastic gradients in each step of the simulated dynamics. While these methods are asymptotically unbiased if the stepsizes are reduced in an appropriate fashion, in practice constant stepsizes are used. This introduces a bias that is often ignored. In this paper we study the mean squared error of Lipschitz functionals in strongly log- concave models with i.i.d. data of growing data set size and show that, given a batchsize, to control the bias of SGLD the stepsize has to be chosen so small that the computational cost of reaching a target accuracy is roughly the same for all batchsizes. Using a control variate approach, the cost can be reduced dramatically. The analysis is performed by considering the algorithms as noisy discretisations of the Langevin SDE which correspond to the Euler method if the full data set is used. An important observation is that the 1scale of the step size is determined by the stability criterion if the accuracy is required for consistent credible intervals. Experimental results confirm our theoretical findings.
NAOct 26, 2016
Probabilistic Linear Multistep MethodsOnur Teymur, Konstantinos Zygalakis, Ben Calderhead
We present a derivation and theoretical investigation of the Adams-Bashforth and Adams-Moulton family of linear multistep methods for solving ordinary differential equations, starting from a Gaussian process (GP) framework. In the limit, this formulation coincides with the classical deterministic methods, which have been used as higher-order initial value problem solvers for over a century. Furthermore, the natural probabilistic framework provided by the GP formulation allows us to derive probabilistic versions of these methods, in the spirit of a number of other probabilistic ODE solvers presented in the recent literature. In contrast to higher-order Runge-Kutta methods, which require multiple intermediate function evaluations per step, Adams family methods make use of previous function evaluations, so that increased accuracy arising from a higher-order multistep approach comes at very little additional computational cost. We show that through a careful choice of covariance function for the GP, the posterior mean and standard deviation over the numerical solution can be made to exactly coincide with the value given by the deterministic method and its local truncation error respectively. We provide a rigorous proof of the convergence of these new methods, as well as an empirical investigation (up to fifth order) demonstrating their convergence rates in practice.
MLSep 15, 2016
Multilevel Monte Carlo for Scalable Bayesian ComputationsMike Giles, Tigran Nagapetyan, Lukasz Szpruch et al.
Markov chain Monte Carlo (MCMC) algorithms are ubiquitous in Bayesian computations. However, they need to access the full data set in order to evaluate the posterior density at every step of the algorithm. This results in a great computational burden in big data applications. In contrast to MCMC methods, Stochastic Gradient MCMC (SGMCMC) algorithms such as the Stochastic Gradient Langevin Dynamics (SGLD) only require access to a batch of the data set at every step. This drastically improves the computational performance and scales well to large data sets. However, the difficulty with SGMCMC algorithms comes from the sensitivity to its parameters which are notoriously difficult to tune. Moreover, the Root Mean Square Error (RMSE) scales as $\mathcal{O}(c^{-\frac{1}{3}})$ as opposed to standard MCMC $\mathcal{O}(c^{-\frac{1}{2}})$ where $c$ is the computational cost. We introduce a new class of Multilevel Stochastic Gradient Markov chain Monte Carlo algorithms that are able to mitigate the problem of tuning the step size and more importantly of recovering the $\mathcal{O}(c^{-\frac{1}{2}})$ convergence of standard Markov Chain Monte Carlo methods without the need to introduce Metropolis-Hasting steps. A further advantage of this new class of algorithms is that it can easily be parallelised over a heterogeneous computer architecture. We illustrate our methodology using Bayesian logistic regression and provide numerical evidence that for a prescribed relative RMSE the computational cost is sublinear in the number of data items.
NAMay 4, 2016
Multilevel Monte Carlo methods for the approximation of invariant measures of stochastic differential equationsMichael B. Giles, Mateusz B. Majka, Lukasz Szpruch et al.
We develop a framework that allows the use of the multi-level Monte Carlo (MLMC) methodology (Giles2015) to calculate expectations with respect to the invariant measure of an ergodic SDE. In that context, we study the (over-damped) Langevin equations with a strongly concave potential. We show that, when appropriate contracting couplings for the numerical integrators are available, one can obtain a uniform in time estimate of the MLMC variance in contrast to the majority of the results in the MLMC literature. As a consequence, a root mean square error of $\mathcal{O}(\varepsilon)$ is achieved with $\mathcal{O}(\varepsilon^{-2})$ complexity on par with Markov Chain Monte Carlo (MCMC) methods, which however can be computationally intensive when applied to large data sets. Finally, we present a multi-level version of the recently introduced Stochastic Gradient Langevin Dynamics (SGLD) method (Welling and Teh, 2011) built for large datasets applications. We show that this is the first stochastic gradient MCMC method with complexity $\mathcal{O}(\varepsilon^{-2}|\log {\varepsilon}|^{3})$, in contrast to the complexity $\mathcal{O}(\varepsilon^{-3})$ of currently available methods. Numerical experiments confirm our theoretical findings.