NAJun 21, 2022
Derivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative LearningThomas O'Leary-Roseberry, Peng Chen, Umberto Villa et al.
We propose derivative-informed neural operators (DINOs), a general family of neural networks to approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest. After discretizations both inputs and outputs are high-dimensional. We aim to approximate not only the operators with improved accuracy but also their derivatives (Jacobians) with respect to the input function-valued parameter to empower derivative-based algorithms in many applications, e.g., Bayesian inverse problems, optimization under parameter uncertainty, and optimal experimental design. The major difficulties include the computational cost of generating derivative training data and the high dimensionality of the problem leading to large training cost. To address these challenges, we exploit the intrinsic low-dimensionality of the derivatives and develop algorithms for compressing derivative information and efficiently imposing it in neural operator training yielding derivative-informed neural operators. We demonstrate that these advances can significantly reduce the costs of both data generation and training for large classes of problems (e.g., nonlinear steady state parametric PDE maps), making the costs marginal or comparable to the costs without using derivatives, and in particular independent of the discretization dimension of the input and output functions. Moreover, we show that the proposed DINO achieves significantly higher accuracy than neural operators trained without derivative information, for both function approximation and derivative approximation (e.g., Gauss-Newton Hessian), especially when the training data are limited.
NAOct 6, 2022
Residual-based error correction for neural operator accelerated infinite-dimensional Bayesian inverse problemsLianghao Cao, Thomas O'Leary-Roseberry, Prashant K. Jha et al.
We explore using neural operators, or neural network representations of nonlinear maps between function spaces, to accelerate infinite-dimensional Bayesian inverse problems (BIPs) with models governed by nonlinear parametric partial differential equations (PDEs). Neural operators have gained significant attention in recent years for their ability to approximate the parameter-to-solution maps defined by PDEs using as training data solutions of PDEs at a limited number of parameter samples. The computational cost of BIPs can be drastically reduced if the large number of PDE solves required for posterior characterization are replaced with evaluations of trained neural operators. However, reducing error in the resulting BIP solutions via reducing the approximation error of the neural operators in training can be challenging and unreliable. We provide an a priori error bound result that implies certain BIPs can be ill-conditioned to the approximation error of neural operators, thus leading to inaccessible accuracy requirements in training. To reliably deploy neural operators in BIPs, we consider a strategy for enhancing the performance of neural operators, which is to correct the prediction of a trained neural operator by solving a linear variational problem based on the PDE residual. We show that a trained neural operator with error correction can achieve a quadratic reduction of its approximation error, all while retaining substantial computational speedups of posterior sampling when models are governed by highly nonlinear PDEs. The strategy is applied to two numerical examples of BIPs based on a nonlinear reaction--diffusion problem and deformation of hyperelastic materials. We demonstrate that posterior representations of the two BIPs produced using trained neural operators are greatly and consistently enhanced by error correction.
OCMar 3
Shape Derivative-Informed Neural Operators with Application to Risk-Averse Shape OptimizationXindi Gong, Dingcheng Luo, Thomas O'Leary-Roseberry et al.
Shape optimization under uncertainty (OUU) is computationally intensive for classical PDE-based methods due to the high cost of repeated sampling-based risk evaluation across many uncertainty realizations and varying geometries, while standard neural surrogates often fail to provide accurate and efficient sensitivities for optimization. We introduce Shape-DINO, a derivative-informed neural operator framework for learning PDE solution operators on families of varying geometries, with a particular focus on accelerating PDE-constrained shape OUU. Shape-DINOs encode geometric variability through diffeomorphic mappings to a fixed reference domain and employ a derivative-informed operator learning objective that jointly learns the PDE solution and its Fréchet derivatives with respect to design variables and uncertain parameters, enabling accurate state predictions and reliable gradients for large-scale OUU. We establish a priori error bounds linking surrogate accuracy to optimization error and prove universal approximation results for multi-input reduced basis neural operators in suitable $C^1$ norms. We demonstrate efficiency and scalability on three representative shape OUU problems, including boundary design for a Poisson equation and shape design governed by steady-state Navier-Stokes exterior flows in two and three dimensions. Across these examples, Shape-DINOs produce more reliable optimization results than operator surrogates trained without derivative information. In our examples, Shape-DINOs achieve 3-8 orders-of-magnitude speedups in state and gradient evaluations. Counting training data generation, Shape-DINOs reduce necessary PDE solves by 1-2 orders-of-magnitude compared to a strictly PDE-based approach for a single OUU problem. Moreover, Shape-DINO construction costs can be amortized across many objectives and risk measures, enabling large-scale shape OUU for complex systems.
LGDec 16, 2025
Derivative-Informed Fourier Neural Operator: Universal Approximation and Applications to PDE-Constrained OptimizationBoyuan Yao, Dingcheng Luo, Lianghao Cao et al.
We present approximation theories and efficient training methods for derivative-informed Fourier neural operators (DIFNOs) with applications to PDE-constrained optimization. A DIFNO is an FNO trained by minimizing its prediction error jointly on output and Fréchet derivative samples of a high-fidelity operator (e.g., a parametric PDE solution operator). As a result, a DIFNO can closely emulate not only the high-fidelity operator's response but also its sensitivities. To motivate the use of DIFNOs instead of conventional FNOs as surrogate models, we show that accurate surrogate-driven PDE-constrained optimization requires accurate surrogate Fréchet derivatives. Then, for continuously differentiable operators, we establish (i) simultaneous universal approximation of FNOs and their Fréchet derivatives on compact sets, and (ii) universal approximation of FNOs in weighted Sobolev spaces with input measures that have unbounded supports. Our theoretical results certify the capability of FNOs for accurate derivative-informed operator learning and accurate solution of PDE-constrained optimization. Furthermore, we develop efficient training schemes using dimension reduction and multi-resolution techniques that significantly reduce memory and computational costs for Fréchet derivative learning. Numerical examples on nonlinear diffusion--reaction, Helmholtz, and Navier--Stokes equations demonstrate that DIFNOs are superior in sample complexity for operator learning and solving infinite-dimensional PDE-constrained inverse problems, achieving high accuracy at low training sample sizes.
53.0LGApr 1
Performance of Neural and Polynomial Operator SurrogatesJosephine Westermann, Benno Huber, Thomas O'Leary-Roseberry et al.
We consider the problem of constructing surrogate operators for parameter-to-solution maps arising from parametric partial differential equations, where repeated forward model evaluations are computationally expensive. We present a systematic empirical comparison of neural operator surrogates, including a reduced-basis neural operator trained with $L^2_μ$ and $H^1_μ$ objectives and the Fourier neural operator, against polynomial surrogate methods, specifically a reduced-basis sparse-grid surrogate and a reduced-basis tensor-train surrogate. All methods are evaluated on a linear parametric diffusion problem and a nonlinear parametric hyperelasticity problem, using input fields with algebraically decaying spectral coefficients at varying rates of decay $s$. To enable fair comparisons, we analyze ensembles of surrogate models generated by varying hyperparameters and compare the resulting Pareto frontiers of cost versus approximation accuracy, decomposing cost into contributions from data generation, setup, and evaluation. Our results show that no single method is universally superior. Polynomial surrogates achieve substantially better data efficiency for smooth input fields ($s \geq 2$), with convergence rates for the sparse-grid surrogate in agreement with theoretical predictions. For rough inputs ($s \leq 1$), the Fourier neural operator displays the fastest convergence rates. Derivative-informed training consistently improves data efficiency over standard $L^2_μ$ training, providing a competitive alternative for rough inputs in the low-data regime when Jacobian information is available at reasonable cost. These findings highlight the importance of matching the surrogate methodology to the regularity of the problem as well as accuracy demands and computational constraints of the application.
NAMar 13, 2024
Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problemsLianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas
We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.
NANov 19, 2024
LazyDINO: Fast, scalable, and efficiently amortized Bayesian inversion via structure-exploiting and surrogate-driven measure transportLianghao Cao, Joshua Chen, Michael Brennan et al.
We present LazyDINO, a transport map variational inference method for fast, scalable, and efficiently amortized solutions of high-dimensional nonlinear Bayesian inverse problems with expensive parameter-to-observable (PtO) maps. Our method consists of an offline phase in which we construct a derivative-informed neural surrogate of the PtO map using joint samples of the PtO map and its Jacobian. During the online phase, when given observational data, we seek rapid posterior approximation using surrogate-driven training of a lazy map [Brennan et al., NeurIPS, (2020)], i.e., a structure-exploiting transport map with low-dimensional nonlinearity. The trained lazy map then produces approximate posterior samples or density evaluations. Our surrogate construction is optimized for amortized Bayesian inversion using lazy map variational inference. We show that (i) the derivative-based reduced basis architecture [O'Leary-Roseberry et al., Comput. Methods Appl. Mech. Eng., 388 (2022)] minimizes the upper bound on the expected error in surrogate posterior approximation, and (ii) the derivative-informed training formulation [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] minimizes the expected error due to surrogate-driven transport map optimization. Our numerical results demonstrate that LazyDINO is highly efficient in cost amortization for Bayesian inversion. We observe one to two orders of magnitude reduction of offline cost for accurate posterior approximation, compared to simulation-based amortized inference via conditional transport and conventional surrogate-driven transport. In particular, LazyDINO outperforms Laplace approximation consistently using fewer than 1000 offline samples, while other amortized inference methods struggle and sometimes fail at 16,000 offline samples.
LGFeb 21, 2025
Verification and Validation for Trustworthy Scientific Machine LearningJohn D. Jakeman, Lorena A. Barba, Joaquim R. R. A. Martins et al.
Scientific machine learning (SciML) models are transforming many scientific disciplines. However, the development of good modeling practices to increase the trustworthiness of SciML has lagged behind its application, limiting its potential impact. The goal of this paper is to start a discussion on establishing consensus-based good practices for predictive SciML. We identify key challenges in applying existing computational science and engineering guidelines, such as verification and validation protocols, and provide recommendations to address these challenges. Our discussion focuses on predictive SciML, which uses machine learning models to learn, improve, and accelerate numerical simulations of physical systems. While centered on predictive applications, our 16 recommendations aim to help researchers conduct and document their modeling processes rigorously across all SciML domains.
NAApr 11, 2025
Dimension reduction for derivative-informed operator learning: An analysis of approximation errorsDingcheng Luo, Thomas O'Leary-Roseberry, Peng Chen et al.
We study the derivative-informed learning of nonlinear operators between infinite-dimensional separable Hilbert spaces by neural networks. Such operators can arise from the solution of partial differential equations (PDEs), and are used in many simulation-based outer-loop tasks in science and engineering, such as PDE-constrained optimization, Bayesian inverse problems, and optimal experimental design. In these settings, the neural network approximations can be used as surrogate models to accelerate the solution of the outer-loop tasks. However, since outer-loop tasks in infinite dimensions often require knowledge of the underlying geometry, the approximation accuracy of the operator's derivatives can also significantly impact the performance of the surrogate model. Motivated by this, we analyze the approximation errors of neural operators in Sobolev norms over infinite-dimensional Gaussian input measures. We focus on the reduced basis neural operator (RBNO), which uses linear encoders and decoders defined on dominant input/output subspaces spanned by reduced sets of orthonormal bases. To this end, we study two methods for generating the bases; principal component analysis (PCA) and derivative-informed subspaces (DIS), which use the dominant eigenvectors of the covariance of the data or the derivatives as the reduced bases, respectively. We then derive bounds for errors arising from both the dimension reduction and the latent neural network approximation, including the sampling errors associated with the empirical estimation of the PCA/DIS. Our analysis is validated on numerical experiments with elliptic PDEs, where our results show that bases informed by the map (i.e., DIS or output PCA) yield accurate reconstructions and generalization errors for both the operator and its derivatives, while input PCA may underperform unless ranks and training sample sizes are sufficiently large.
OCMay 31, 2023
Efficient PDE-Constrained optimization under high-dimensional uncertainty using derivative-informed neural operatorsDingcheng Luo, Thomas O'Leary-Roseberry, Peng Chen et al.
We propose a novel machine learning framework for solving optimization problems governed by large-scale partial differential equations (PDEs) with high-dimensional random parameters. Such optimization under uncertainty (OUU) problems may be computational prohibitive using classical methods, particularly when a large number of samples is needed to evaluate risk measures at every iteration of an optimization algorithm, where each sample requires the solution of an expensive-to-solve PDE. To address this challenge, we propose a new neural operator approximation of the PDE solution operator that has the combined merits of (1) accurate approximation of not only the map from the joint inputs of random parameters and optimization variables to the PDE state, but also its derivative with respect to the optimization variables, (2) efficient construction of the neural network using reduced basis architectures that are scalable to high-dimensional OUU problems, and (3) requiring only a limited number of training data to achieve high accuracy for both the PDE solution and the OUU solution. We refer to such neural operators as multi-input reduced basis derivative informed neural operators (MR-DINOs). We demonstrate the accuracy and efficiency our approach through several numerical experiments, i.e. the risk-averse control of a semilinear elliptic PDE and the steady state Navier--Stokes equations in two and three spatial dimensions, each involving random field inputs. Across the examples, MR-DINOs offer $10^{3}$--$10^{7} \times$ reductions in execution time, and are able to produce OUU solutions of comparable accuracies to those from standard PDE based solutions while being over $10 \times$ more cost-efficient after factoring in the cost of construction.
LGDec 14, 2021
Learning High-Dimensional Parametric Maps via Reduced Basis Adaptive Residual NetworksThomas O'Leary-Roseberry, Xiaosong Du, Anirban Chaudhuri et al.
We propose a scalable framework for the learning of high-dimensional parametric maps via adaptively constructed residual network (ResNet) maps between reduced bases of the inputs and outputs. When just few training data are available, it is beneficial to have a compact parametrization in order to ameliorate the ill-posedness of the neural network training problem. By linearly restricting high-dimensional maps to informed reduced bases of the inputs, one can compress high-dimensional maps in a constructive way that can be used to detect appropriate basis ranks, equipped with rigorous error estimates. A scalable neural network learning framework is thus to learn the nonlinear compressed reduced basis mapping. Unlike the reduced basis construction, however, neural network constructions are not guaranteed to reduce errors by adding representation power, making it difficult to achieve good practical performance. Inspired by recent approximation theory that connects ResNets to sequential minimizing flows, we present an adaptive ResNet construction algorithm. This algorithm allows for depth-wise enrichment of the neural network approximation, in a manner that can achieve good practical performance by first training a shallow network and then adapting. We prove universal approximation of the associated neural network class for $L^2_ν$ functions on compact sets. Our overall framework allows for constructive means to detect appropriate breadth and depth, and related compact parametrizations of neural networks, significantly reducing the need for architectural hyperparameter tuning. Numerical experiments for parametric PDE problems and a 3D CFD wing design optimization parametric map demonstrate that the proposed methodology can achieve remarkably high accuracy for limited training data, and outperformed other neural network strategies we compared against.
NANov 30, 2020
Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEsThomas O'Leary-Roseberry, Umberto Villa, Peng Chen et al.
Many-query problems, arising from uncertainty quantification, Bayesian inversion, Bayesian optimal experimental design, and optimization under uncertainty-require numerous evaluations of a parameter-to-output map. These evaluations become prohibitive if this parametric map is high-dimensional and involves expensive solution of partial differential equations (PDEs). To tackle this challenge, we propose to construct surrogates for high-dimensional PDE-governed parametric maps in the form of projected neural networks that parsimoniously capture the geometry and intrinsic low-dimensionality of these maps. Specifically, we compute Jacobians of these PDE-based maps, and project the high-dimensional parameters onto a low-dimensional derivative-informed active subspace; we also project the possibly high-dimensional outputs onto their principal subspace. This exploits the fact that many high-dimensional PDE-governed parametric maps can be well-approximated in low-dimensional parameter and output subspace. We use the projection basis vectors in the active subspace as well as the principal output subspace to construct the weights for the first and last layers of the neural network, respectively. This frees us to train the weights in only the low-dimensional layers of the neural network. The architecture of the resulting neural network captures to first order, the low-dimensional structure and geometry of the parametric map. We demonstrate that the proposed projected neural network achieves greater generalization accuracy than a full neural network, especially in the limited training data regime afforded by expensive PDE-based parametric maps. Moreover, we show that the number of degrees of freedom of the inner layers of the projected network is independent of the parameter and output dimensions, and high accuracy can be achieved with weight dimension independent of the discretization dimension.
OCFeb 7, 2020
Ill-Posedness and Optimization Geometry for Nonlinear Neural Network TrainingThomas O'Leary-Roseberry, Omar Ghattas
In this work we analyze the role nonlinear activation functions play at stationary points of dense neural network training problems. We consider a generic least squares loss function training formulation. We show that the nonlinear activation functions used in the network construction play a critical role in classifying stationary points of the loss landscape. We show that for shallow dense networks, the nonlinear activation function determines the Hessian nullspace in the vicinity of global minima (if they exist), and therefore determines the ill-posedness of the training problem. Furthermore, for shallow nonlinear networks we show that the zeros of the activation function and its derivatives can lead to spurious local minima, and discuss conditions for strict saddle points. We extend these results to deep dense neural networks, showing that the last activation function plays an important role in classifying stationary points, due to how it shows up in the gradient from the chain rule.
OCFeb 7, 2020
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex OptimizationThomas O'Leary-Roseberry, Nick Alger, Omar Ghattas
In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation. Additionally, LRSFN can facilitate fast escape from indefinite regions leading to better optimization solutions. In the SA setting, iterative updates are dominated by stochastic noise, and stability of the method is key. We introduce a continuous time stability analysis framework, and use it to demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned Hessians. The LRSFN method mitigates this stability issue via Levenberg-Marquardt damping. However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that LRSFN can escape indefinite regions that other methods have issues with; and even under restrictive step length conditions, LRSFN can outperform popular first order methods on large scale deep learning tasks in terms of generalizability for equivalent computational work.