MLDec 30, 2025
Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inferenceShane A. McQuarrie, Mengwu Guo, Anirban Chaudhuri
This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems, which can serve as the foundation of virtual assets in a digital twin. Data-driven ROMs are explainable, computationally efficient scientific machine learning models that aim to preserve the underlying physics of complex dynamical simulations. Since the quality of data-driven ROMs is sensitive to the quality of the limited training data, we seek to identify training parameters for which using the associated training data results in the best possible parametric ROM. Our approach uses the operator inference methodology, a regression-based strategy which can be tailored to particular parametric structure for a large class of problems. We establish a probabilistic version of parametric operator inference, casting the learning problem as a Bayesian linear regression. Prediction uncertainties stemming from the resulting probabilistic ROM solutions are used to design a sequential adaptive sampling scheme to select new training parameter vectors that promote ROM stability and accuracy globally in the parameter domain. We conduct numerical experiments for several nonlinear parametric systems of partial differential equations and compare the results to ROMs trained on random parameter samples. The results demonstrate that the proposed adaptive sampling strategy consistently yields more stable and accurate ROMs than random sampling does under the same computational budget.
MLMar 13, 2024
Multifidelity linear regression for scientific machine learning from scarce dataElizabeth Qian, Dayoung Kang, Vignesh Sella et al.
Machine learning (ML) methods, which fit to data the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems for which traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data on which to train ML models is expensive, and the available budget for generating training data is limited, so that high-fidelity training data are scarce. ML models trained on scarce data have high variance, resulting in poor expected generalization performance. We propose a new multifidelity training approach for scientific machine learning via linear regression that exploits the scientific context where data of varying fidelities and costs are available: for example, high-fidelity data may be generated by an expensive fully resolved physics simulation whereas lower-fidelity data may arise from a cheaper model based on simplifying assumptions. We use the multifidelity data within an approximate control variate framework to define new multifidelity Monte Carlo estimators for linear regression models. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data. Numerical results demonstrate that our multifidelity training approach achieves similar accuracy to the standard high-fidelity only approach with orders-of-magnitude reduced high-fidelity data requirements.
MLAug 11, 2025
Projection-based multifidelity linear regression for data-scarce applicationsVignesh Sella, Julie Pham, Karen Willcox et al.
Surrogate modeling for systems with high-dimensional quantities of interest remains challenging, particularly when training data are costly to acquire. This work develops multifidelity methods for multiple-input multiple-output linear regression targeting data-limited applications with high-dimensional outputs. Multifidelity methods integrate many inexpensive low-fidelity model evaluations with limited, costly high-fidelity evaluations. We introduce two projection-based multifidelity linear regression approaches that leverage principal component basis vectors for dimensionality reduction and combine multifidelity data through: (i) a direct data augmentation using low-fidelity data, and (ii) a data augmentation incorporating explicit linear corrections between low-fidelity and high-fidelity data. The data augmentation approaches combine high-fidelity and low-fidelity data into a unified training set and train the linear regression model through weighted least squares with fidelity-specific weights. Various weighting schemes and their impact on regression accuracy are explored. The proposed multifidelity linear regression methods are demonstrated on approximating the surface pressure field of a hypersonic vehicle in flight. In a low-data regime of no more than ten high-fidelity samples, multifidelity linear regression achieves approximately 3% - 12% improvement in median accuracy compared to single-fidelity methods with comparable computational cost.
LGDec 14, 2021
Learning High-Dimensional Parametric Maps via Reduced Basis Adaptive Residual NetworksThomas O'Leary-Roseberry, Xiaosong Du, Anirban Chaudhuri et al.
We propose a scalable framework for the learning of high-dimensional parametric maps via adaptively constructed residual network (ResNet) maps between reduced bases of the inputs and outputs. When just few training data are available, it is beneficial to have a compact parametrization in order to ameliorate the ill-posedness of the neural network training problem. By linearly restricting high-dimensional maps to informed reduced bases of the inputs, one can compress high-dimensional maps in a constructive way that can be used to detect appropriate basis ranks, equipped with rigorous error estimates. A scalable neural network learning framework is thus to learn the nonlinear compressed reduced basis mapping. Unlike the reduced basis construction, however, neural network constructions are not guaranteed to reduce errors by adding representation power, making it difficult to achieve good practical performance. Inspired by recent approximation theory that connects ResNets to sequential minimizing flows, we present an adaptive ResNet construction algorithm. This algorithm allows for depth-wise enrichment of the neural network approximation, in a manner that can achieve good practical performance by first training a shallow network and then adapting. We prove universal approximation of the associated neural network class for $L^2_ν$ functions on compact sets. Our overall framework allows for constructive means to detect appropriate breadth and depth, and related compact parametrizations of neural networks, significantly reducing the need for architectural hyperparameter tuning. Numerical experiments for parametric PDE problems and a 3D CFD wing design optimization parametric map demonstrate that the proposed methodology can achieve remarkably high accuracy for limited training data, and outperformed other neural network strategies we compared against.
MLOct 6, 2019
mfEGRA: Multifidelity Efficient Global Reliability Analysis through Active Learning for Failure Boundary LocationAnirban Chaudhuri, Alexandre N. Marques, Karen E. Willcox
This paper develops mfEGRA, a multifidelity active learning method using data-driven adaptively refined surrogates for failure boundary location in reliability analysis. This work addresses the issue of prohibitive cost of reliability analysis using Monte Carlo sampling for expensive-to-evaluate high-fidelity models by using cheaper-to-evaluate approximations of the high-fidelity model. The method builds on the Efficient Global Reliability Analysis (EGRA) method, which is a surrogate-based method that uses adaptive sampling for refining Gaussian process surrogates for failure boundary location using a single-fidelity model. Our method introduces a two-stage adaptive sampling criterion that uses a multifidelity Gaussian process surrogate to leverage multiple information sources with different fidelities. The method combines expected feasibility criterion from EGRA with one-step lookahead information gain to refine the surrogate around the failure boundary. The computational savings from mfEGRA depends on the discrepancy between the different models, and the relative cost of evaluating the different models as compared to the high-fidelity model. We show that accurate estimation of reliability using mfEGRA leads to computational savings of $\sim$46% for an analytic multimodal test problem and 24% for a three-dimensional acoustic horn problem, when compared to single-fidelity EGRA. We also show the effect of using a priori drawn Monte Carlo samples in the implementation for the acoustic horn problem, where mfEGRA leads to computational savings of 45% for the three-dimensional case and 48% for a rarer event four-dimensional case as compared to single-fidelity EGRA.