COJul 15, 2014
Data-Driven Model Reduction for the Bayesian Solution of Inverse ProblemsTiangang Cui, Youssef M. Marzouk, Karen E. Willcox
One of the major challenges in the Bayesian solution of inverse problems governed by partial differential equations (PDEs) is the computational cost of repeatedly evaluating numerical PDE models, as required by Markov chain Monte Carlo (MCMC) methods for posterior sampling. This paper proposes a data-driven projection-based model reduction technique to reduce this computational cost. The proposed technique has two distinctive features. First, the model reduction strategy is tailored to inverse problems: the snapshots used to construct the reduced-order model are computed adaptively from the posterior distribution. Posterior exploration and model reduction are thus pursued simultaneously. Second, to avoid repeated evaluations of the full-scale numerical model as in a standard MCMC method, we couple the full-scale model and the reduced-order model together in the MCMC algorithm. This maintains accurate inference while reducing its overall computational cost. In numerical experiments considering steady-state flow in a porous medium, the data-driven reduced-order model achieves better accuracy than a reduced-order model constructed using the classical approach. It also improves posterior sampling efficiency by several orders of magnitude compared to a standard MCMC method.
COJul 15, 2014
Likelihood-informed dimension reduction for nonlinear inverse problemsTiangang Cui, James Martin, Youssef M. Marzouk et al.
The intrinsic dimensionality of an inverse problem is affected by prior information, the accuracy and number of observations, and the smoothing properties of the forward operator. From a Bayesian perspective, changes from the prior to the posterior may, in many problems, be confined to a relatively low-dimensional subspace of the parameter space. We present a dimension reduction approach that defines and identifies such a subspace, called the "likelihood-informed subspace" (LIS), by characterizing the relative influences of the prior and the likelihood over the support of the posterior distribution. This identification enables new and more efficient computational methods for Bayesian inference with nonlinear forward models and Gaussian priors. In particular, we approximate the posterior distribution as the product of a lower-dimensional posterior defined on the LIS and the prior distribution marginalized onto the complementary subspace. Markov chain Monte Carlo sampling can then proceed in lower dimensions, with significant gains in computational efficiency. We also introduce a Rao-Blackwellization strategy that de-randomizes Monte Carlo estimates of posterior expectations for additional variance reduction. We demonstrate the efficiency of our methods using two numerical examples: inference of permeability in a groundwater system governed by an elliptic PDE, and an atmospheric remote sensing problem based on Global Ozone Monitoring System (GOMOS) observations.
COApr 30, 2016
Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reductionTiangang Cui, Youssef M. Marzouk, Karen E. Willcox
Two major bottlenecks to the solution of large-scale Bayesian inverse problems are the scaling of posterior sampling algorithms to high-dimensional parameter spaces and the computational cost of forward model evaluations. Yet incomplete or noisy data, the state variation and parameter dependence of the forward model, and correlations in the prior collectively provide useful structure that can be exploited for dimension reduction in this setting--both in the parameter space of the inverse problem and in the state space of the forward model. To this end, we show how to jointly construct low-dimensional subspaces of the parameter space and the state space in order to accelerate the Bayesian solution of the inverse problem. As a byproduct of state dimension reduction, we also show how to identify low-dimensional subspaces of the data in problems with high-dimensional observations. These subspaces enable approximation of the posterior as a product of two factors: (i) a projection of the posterior onto a low-dimensional parameter subspace, wherein the original likelihood is replaced by an approximation involving a reduced model; and (ii) the marginal prior distribution on the high-dimensional complement of the parameter subspace. We present and compare several strategies for constructing these subspaces using only a limited number of forward and adjoint model simulations. The resulting posterior approximations can rapidly be characterized using standard sampling techniques, e.g., Markov chain Monte Carlo. Two numerical examples demonstrate the accuracy and efficiency of our approach: inversion of an integral equation in atmospheric remote sensing, where the data dimension is very high; and the inference of a heterogeneous transmissivity field in a groundwater system, which involves a partial differential equation forward model with high dimensional state and parameters.
MEMar 14, 2017
Goal-oriented optimal approximations of Bayesian linear inverse problemsAlessio Spantini, Tiangang Cui, Karen Willcox et al.
We propose optimal dimensionality reduction techniques for the solution of goal-oriented linear-Gaussian inverse problems, where the quantity of interest (QoI) is a function of the inversion parameters. These approximations are suitable for large-scale applications. In particular, we study the approximation of the posterior covariance of the QoI as a low-rank negative update of its prior covariance, and prove optimality of this update with respect to the natural geodesic distance on the manifold of symmetric positive definite matrices. Assuming exact knowledge of the posterior mean of the QoI, the optimality results extend to optimality in distribution with respect to the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose approximation of the posterior mean of the QoI as a low-rank linear function of the data, and prove optimality of this approximation with respect to a weighted Bayes risk. Both of these optimal approximations avoid the explicit computation of the full posterior distribution of the parameters and instead focus on directions that are well informed by the data and relevant to the QoI. These directions stem from a balance among all the components of the goal-oriented inverse problem: prior information, forward model, measurement noise, and ultimate goals. We illustrate the theory using a high-dimensional inverse problem in heat transfer.
66.1OCMay 4
Subspace accelerated measure transport methods for fast and scalable sequential experimental design, with application to photoacoustic imagingTiangang Cui, Karina Koval, Roland Herzog et al.
We propose a novel approach for sequential optimal experimental design (sOED) for Bayesian inverse problems involving expensive models with high-dimensional unknown parameters. This work focuses on designs that maximize the expected information gain (EIG) from prior to posterior, a task that is computationally very challenging in non-Gaussian settings. This challenge is amplified in sOED, as the incremental expected information gain (iEIG) must be repeatedly approximated across distinct stages, with both prior and posterior distributions being intractable. To address this, we derive a general-purpose, derivative-based upper bound for the iEIG, which not only guides design placement but also enables the construction of projectors onto likelihood-informed subspaces, facilitating parameter dimension reduction. By combining this approach with conditional measure transport maps for the sequence of posteriors, we develop a unified sOED and amortized inference framework scalable to high- and infinite-dimensional problems. Numerical experiments for two inverse problems governed by partial differential equations (PDEs) demonstrate the effectiveness of designs by maximizing the proposed bound.
MLSep 5, 2022
Deep importance sampling using tensor trains with application to a priori and a posteriori rare event estimationTiangang Cui, Sergey Dolgov, Robert Scheichl
We propose a deep importance sampling method that is suitable for estimating rare event probabilities in high-dimensional problems. We approximate the optimal importance distribution in a general importance sampling problem as the pushforward of a reference distribution under a composition of order-preserving transformations, in which each transformation is formed by a squared tensor-train decomposition. The squared tensor-train decomposition provides a scalable ansatz for building order-preserving high-dimensional transformations via density approximations. The use of composition of maps moving along a sequence of bridging densities alleviates the difficulty of directly approximating concentrated density functions. To compute expectations over unnormalized probability distributions, we design a ratio estimator that estimates the normalizing constant using a separate importance distribution, again constructed via a composition of transformations in tensor-train format. This offers better theoretical variance reduction compared with self-normalized importance sampling, and thus opens the door to efficient computation of rare event probabilities in Bayesian inference problems. Numerical experiments on problems constrained by differential equations show little to no increase in the computational complexity with the event probability going to zero, and allow to compute hitherto unattainable estimates of rare event probabilities for complex, high-dimensional posterior densities.
37.3CEApr 20
Multiscale Structural Reliability Analysis in high dimensions with Tensor Trains and Physics-Augmented Neural NetworksAryan Tyagi, Alex de Beer, Tiangang Cui et al.
Structural reliability evaluation for composites constitutes a fundamentally high-dimensional multiscale problem, as microscale material uncertainties must propagate to the macroscale and can be quantified as high-dimensional random fields. Conventional approaches are computationally intractable, as they rely on repeatedly solving coupled partial differential equation systems across scales while contending with the exponential complexity inherent in high-dimensional uncertainty quantification. This work introduces a scalable and physically consistent framework that addresses both bottlenecks simultaneously in the case of separation of scales and (anisotropic) linear elasticity. In particular, we couple a physics-augmented Voigt--Reuss Neural Network (VRNN) with the Deep Inverse Rosenblatt Transport (DIRT) method to estimate the posterior probability of structural failure. The VRNN is used to resolve the computationally expensive FE$^2$ scheme by providing a near-instantaneous evaluation of the homogenized stiffness tensor that is guaranteed to be symmetric, positive-definite, and strictly bounded within the Voigt--Reuss limits, enabling fast evaluation of the homogenized responses. The DIRT method constructs a sequence of functional tensor train approximations to efficiently store an approximation of the high-dimensional optimal importance sampling distribution for estimating the probability of failure. This mitigates the curse of dimensionality arising from the Karhunen--Loève expansion of the random fields. The framework is demonstrated on a three-dimensional heterogeneous benchmark problem, where the uncertainty in the microscale material properties is characterized by a Bayesian posterior distribution obtained from limited strain observations. Our results show that the proposed framework can provide low-variance estimates of failure probabilities in dimensions up to 150.
COMP-PHSep 5, 2022
A variational neural network approach for glacier modelling with nonlinear rheologyTiangang Cui, Zhongjian Wang, Zhiwen Zhang
In this paper, we propose a mesh-free method to solve full stokes equation which models the glacier movement with nonlinear rheology. Our approach is inspired by the Deep-Ritz method proposed in [12]. We first formulate the solution of non-Newtonian ice flow model into the minimizer of a variational integral with boundary constraints. The solution is then approximated by a deep neural network whose loss function is the variational integral plus soft constraint from the mixed boundary conditions. Instead of introducing mesh grids or basis functions to evaluate the loss function, our method only requires uniform samplers of the domain and boundaries. To address instability in real-world scaling, we re-normalize the input of the network at the first layer and balance the regularizing factors for each individual boundary. Finally, we illustrate the performance of our method by several numerical experiments, including a 2D model with analytical solution, Arolla glacier model with real scaling and a 3D model with periodic boundary conditions. Numerical results show that our proposed method is efficient in solving the non-Newtonian mechanics arising from glacier modeling with nonlinear rheology.
92.2MLApr 8
Amortized Filtering and Smoothing with Conditional Normalizing FlowsTiangang Cui, Xiaodong Feng, Chenlong Pei et al.
Bayesian filtering and smoothing for high-dimensional nonlinear dynamical systems are fundamental yet challenging problems in many areas of science and engineering. In this work, we propose AFSF, a unified amortized framework for filtering and smoothing with conditional normalizing flows. The core idea is to encode each observation history into a fixed-dimensional summary statistic and use this shared representation to learn both a forward flow for the filtering distribution and a backward flow for the backward transition kernel. Specifically, a recurrent encoder maps each observation history to a fixed-dimensional summary statistic whose dimension does not depend on the length of the time series. Conditioned on this shared summary statistic, the forward flow approximates the filtering distribution, while the backward flow approximates the backward transition kernel. The smoothing distribution over an entire trajectory is then recovered by combining the terminal filtering distribution with the learned backward flow through the standard backward recursion. By learning the underlying temporal evolution structure, AFSF also supports extrapolation beyond the training horizon. Moreover, by coupling the two flows through shared summary statistics, AFSF induces an implicit regularization across latent state trajectories and improves trajectory-level smoothing. In addition, we develop a flow-based particle filtering variant that provides an alternative filtering procedure and enables ESS-based diagnostics when explicit model factors are available. Numerical experiments demonstrate that AFSF provides accurate approximations of both filtering distributions and smoothing paths.
MLFeb 27, 2024
Sequential transport maps using SoS density estimation and $α$-divergencesBenjamin Zanger, Olivier Zahm, Tiangang Cui et al.
Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and $α$-divergences for approximating the intermediate densities. Combining SoS densities with $α$-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of $α$-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide a new convergence analyses of the sequential transport maps based on information geometric properties of $α$-divergences. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on Bayesian inference problems and unsupervised learning tasks.
MLOct 27, 2024
Low-rank Bayesian matrix completion via geodesic Hamiltonian Monte Carlo on Stiefel manifoldsTiangang Cui, Alex Gorodetsky
We present a new sampling-based approach for enabling efficient computation of low-rank Bayesian matrix completion and quantifying the associated uncertainty. Firstly, we design a new prior model based on the singular-value-decomposition (SVD) parametrization of low-rank matrices. Our prior is analogous to the seminal nuclear-norm regularization used in non-Bayesian setting and enforces orthogonality in the factor matrices by constraining them to Stiefel manifolds. Then, we design a geodesic Hamiltonian Monte Carlo (-within-Gibbs) algorithm for generating posterior samples of the SVD factor matrices. We demonstrate that our approach resolves the sampling difficulties encountered by standard Gibbs samplers for the common two-matrix factorization used in matrix completion. More importantly, the geodesic Hamiltonian sampler allows for sampling in cases with more general likelihoods than the typical Gaussian likelihood and Gaussian prior assumptions adopted in most of the existing Bayesian matrix completion literature. We demonstrate an applications of our approach to fit the categorical data of a mice protein dataset and the MovieLens recommendation problem. Numerical examples demonstrate superior sampling performance, including better mixing and faster convergence to a stationary distribution. Moreover, they demonstrate improved accuracy on the two real-world benchmark problems we considered.
MLJun 18, 2024
Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalitiesMatthew T. C. Li, Tiangang Cui, Fengyi Li et al.
Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincaré inequality offers improved bounds.
MLJun 8, 2021
Conditional Deep Inverse Rosenblatt TransportsTiangang Cui, Sergey Dolgov, Olivier Zahm
We present a novel offline-online method to mitigate the computational burden of Bayesian inference, particularly in the regime where the posterior densities are computationally demanding to evaluate while real-time inference results are needed. In the offline phase, the proposed method learns the joint law of the parameter random variables and the observable random variables in the tensor-train (TT) format. Then, in the online phase, the resulting order-preserving transport can be conditioned on newly observed data to characterize the posterior random variables in real-time. Compared with the state-of-the-art normalizing flows techniques, our proposed method relies on function approximation, for which we can provide a thorough performance analysis. The function approximation perspective allows us to significantly improve the capability of transport maps in challenging problems with high-dimensional observations and high-dimensional parameters. Capitalizing on this, we present novel heuristics to either reorder or reparametrize the variables to enhance the approximation power of TT. We then integrate the TT-based transport maps and the parameter reordering/reparametrization into a layered composite map to further improve the performance of the resulting inference. We demonstrate the efficiency of the proposed method on various statistical learning tasks involving ordinary differential equations (ODEs) and partial differential equations (PDEs).
NCJan 26, 2021
Identification of brain states, transitions, and communities using functional MRILingbin Bian, Tiangang Cui, B. T. Thomas Yeo et al.
Brain function relies on a precisely coordinated and dynamic balance between the functional integration and segregation of distinct neural systems. Characterizing the way in which neural systems reconfigure their interactions to give rise to distinct but hidden brain states remains an open challenge. In this paper, we propose a Bayesian model-based characterization of latent brain states and showcase a novel method based on posterior predictive discrepancy using the latent block model to detect transitions between latent brain states in blood oxygen level-dependent (BOLD) time series. The set of estimated parameters in the model includes a latent label vector that assigns network nodes to communities, and also block model parameters that reflect the weighted connectivity within and between communities. Besides extensive in-silico model evaluation, we also provide empirical validation (and replication) using the Human Connectome Project (HCP) dataset of 100 healthy adults. Our results obtained through an analysis of task-fMRI data during working memory performance show appropriate lags between external task demands and change-points between brain states, with distinctive community patterns distinguishing fixation, low-demand and high-demand task conditions.
MLJul 14, 2020
Deep composition of tensor-trains using squared inverse Rosenblatt transportsTiangang Cui, Sergey Dolgov
Characterising intractable high-dimensional random variables is one of the fundamental challenges in stochastic computation. The recent surge of transport maps offers a mathematical foundation and new insights for tackling this challenge by coupling intractable random variables with tractable reference random variables. This paper generalises the functional tensor-train approximation of the inverse Rosenblatt transport recently developed by Dolgov et al. (Stat Comput 30:603--625, 2020) to a wide class of high-dimensional non-negative functions, such as unnormalised probability density functions. First, we extend the inverse Rosenblatt transform to enable the transport to general reference measures other than the uniform measure. We develop an efficient procedure to compute this transport from a squared tensor-train decomposition which preserves the monotonicity. More crucially, we integrate the proposed order-preserving functional tensor-train transport into a nested variable transformation framework inspired by the layered structure of deep neural networks. The resulting deep inverse Rosenblatt transport significantly expands the capability of tensor approximations and transport maps to random variables with complicated nonlinear interactions and concentrated density functions. We demonstrate the efficiency of the proposed approach on a range of applications in statistical learning and uncertainty quantification, including parameter estimation for dynamical systems and inverse problems constrained by partial differential equations.
COFeb 15, 2020
Optimization-Based MCMC Methods for Nonlinear Hierarchical Statistical Inverse ProblemsJohnathan Bardsley, Tiangang Cui
In many hierarchical inverse problems, not only do we want to estimate high- or infinite-dimensional model parameters in the parameter-to-observable maps, but we also have to estimate hyperparameters that represent critical assumptions in the statistical and mathematical modeling processes. As a joint effect of high-dimensionality, nonlinear dependence, and non-concave structures in the joint posterior posterior distribution over model parameters and hyperparameters, solving inverse problems in the hierarchical Bayesian setting poses a significant computational challenge. In this work, we aim to develop scalable optimization-based Markov chain Monte Carlo (MCMC) methods for solving hierarchical Bayesian inverse problems with nonlinear parameter-to-observable maps and a broader class of hyperparameters. Our algorithmic development is based on the recently developed scalable randomize-then-optimize (RTO) method [4] for exploring the high- or infinite-dimensional model parameter space. By using RTO either as a proposal distribution in a Metropolis-within-Gibbs update or as a biasing distribution in the pseudo-marginal MCMC [2], we are able to design efficient sampling tools for hierarchical Bayesian inversion. In particular, the integration of RTO and the pseudo-marginal MCMC has sampling performance robust to model parameter dimensions. We also extend our methods to nonlinear inverse problems with Poisson-distributed measurements. Numerical examples in PDE-constrained inverse problems and positron emission tomography (PET) are used to demonstrate the performance of our methods.
MLJan 23, 2019
Stein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural NetworksGianluca Detommaso, Hanne Hoitzing, Tiangang Cui et al.
Bayesian online changepoint detection (BOCPD) (Adams & MacKay, 2007) offers a rigorous and viable way to identify changepoints in complex systems. In this work, we introduce a Stein variational online changepoint detection (SVOCD) method to provide a computationally tractable generalization of BOCPD beyond the exponential family of probability distributions. We integrate the recently developed Stein variational Newton (SVN) method (Detommaso et al., 2018) and BOCPD to offer a full online Bayesian treatment for a large number of situations with significant importance in practice. We apply the resulting method to two challenging and novel applications: Hawkes processes and long short-term memory (LSTM) neural networks. In both cases, we successfully demonstrate the efficacy of our method on real data.
MLJun 8, 2018
A Stein variational Newton methodGianluca Detommaso, Tiangang Cui, Alessio Spantini et al.
Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.
COOct 17, 2015
Dimension-independent likelihood-informed MCMCTiangang Cui, Kody J. H. Law, Youssef M. Marzouk
Many Bayesian inference problems require exploring the posterior distribution of high-dimensional parameters that represent the discretization of an underlying function. This work introduces a family of Markov chain Monte Carlo (MCMC) samplers that can adapt to the particular structure of a posterior distribution over functions. Two distinct lines of research intersect in the methods developed here. First, we introduce a general class of operator-weighted proposal distributions that are well defined on function space, such that the performance of the resulting MCMC samplers is independent of the discretization of the function. Second, by exploiting local Hessian information and any associated low-dimensional structure in the change from prior to posterior distributions, we develop an inhomogeneous discretization scheme for the Langevin stochastic differential equation that yields operator-weighted proposals adapted to the non-Gaussian structure of the posterior. The resulting dimension-independent, likelihood-informed (DILI) MCMC samplers may be useful for a large class of high-dimensional problems where the target probability measure has a density with respect to a Gaussian reference measure. Two nonlinear inverse problems are used to demonstrate the efficiency of these DILI samplers: an elliptic PDE coefficient inverse problem and path reconstruction in a conditioned diffusion.
NAJul 6, 2015
Optimal low-rank approximations of Bayesian linear inverse problemsAlessio Spantini, Antti Solonen, Tiangang Cui et al.
In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the Förstner metric for symmetric positive definite matrices, as well as the Kullback-Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are deployed in an offline-online manner, where a more costly but data-independent offline calculation is followed by fast online evaluations. As a result, these approximations are particularly useful when repeated posterior mean evaluations are required for multiple data sets. We demonstrate our theoretical results with several numerical examples, including high-dimensional X-ray tomography and an inverse heat conduction problem. In both of these examples, the intrinsic low-dimensional structure of the inference problem can be exploited while producing results that are essentially indistinguishable from solutions computed in the full space.