MLMar 13, 2024
Multifidelity linear regression for scientific machine learning from scarce dataElizabeth Qian, Dayoung Kang, Vignesh Sella et al.
Machine learning (ML) methods, which fit to data the parameters of a given parameterized model class, have garnered significant interest as potential methods for learning surrogate models for complex engineering systems for which traditional simulation is expensive. However, in many scientific and engineering settings, generating high-fidelity data on which to train ML models is expensive, and the available budget for generating training data is limited, so that high-fidelity training data are scarce. ML models trained on scarce data have high variance, resulting in poor expected generalization performance. We propose a new multifidelity training approach for scientific machine learning via linear regression that exploits the scientific context where data of varying fidelities and costs are available: for example, high-fidelity data may be generated by an expensive fully resolved physics simulation whereas lower-fidelity data may arise from a cheaper model based on simplifying assumptions. We use the multifidelity data within an approximate control variate framework to define new multifidelity Monte Carlo estimators for linear regression models. We provide bias and variance analysis of our new estimators that guarantee the approach's accuracy and improved robustness to scarce high-fidelity data. Numerical results demonstrate that our multifidelity training approach achieves similar accuracy to the standard high-fidelity only approach with orders-of-magnitude reduced high-fidelity data requirements.
MLAug 11, 2025
Projection-based multifidelity linear regression for data-scarce applicationsVignesh Sella, Julie Pham, Karen Willcox et al.
Surrogate modeling for systems with high-dimensional quantities of interest remains challenging, particularly when training data are costly to acquire. This work develops multifidelity methods for multiple-input multiple-output linear regression targeting data-limited applications with high-dimensional outputs. Multifidelity methods integrate many inexpensive low-fidelity model evaluations with limited, costly high-fidelity evaluations. We introduce two projection-based multifidelity linear regression approaches that leverage principal component basis vectors for dimensionality reduction and combine multifidelity data through: (i) a direct data augmentation using low-fidelity data, and (ii) a data augmentation incorporating explicit linear corrections between low-fidelity and high-fidelity data. The data augmentation approaches combine high-fidelity and low-fidelity data into a unified training set and train the linear regression model through weighted least squares with fidelity-specific weights. Various weighting schemes and their impact on regression accuracy are explored. The proposed multifidelity linear regression methods are demonstrated on approximating the surface pressure field of a hypersonic vehicle in flight. In a low-data regime of no more than ten high-fidelity samples, multifidelity linear regression achieves approximately 3% - 12% improvement in median accuracy compared to single-fidelity methods with comparable computational cost.