ML LGFeb 6, 2024

Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process

Sandipp Krishnan Ravi, Yigitcan Comlek, Arjun Pathak, Vipul Gupta, Rajnikant Umretiya, Andrew Hoffman, Ghanshyam Pilania, Piyush Pandita, Sayan Ghosh, Nathaniel Mckeever, Wei Chen, Liping Wang

arXiv:2402.04146v49.24 citationsh-index: 11Eng appl artif intell

Originality Incremental advance

AI Analysis

This work addresses the challenge of integrating diverse data sources in fields like materials science, offering a method to improve predictive accuracy in sparse-data scenarios, though it appears incremental as it builds on existing LVGP techniques.

The paper tackles the problem of fusing multi-source data with varying quality and unknown physical parameters by proposing a Latent Variable Gaussian Process (LVGP) framework that maps sources into an interpretable latent space, resulting in better predictions for sparse-data problems compared to single-source or source-unaware models.

With the advent of artificial intelligence and machine learning, various domains of science and engineering communities have leveraged data-driven surrogates to model complex systems through fusing numerous sources of information (data) from published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources, which could have downstream implications during system optimization. Additionally, existing methods cannot fuse multi-source data into a single predictive model. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic categorical variable that are mapped into a physically interpretable latent space, allowing the development of source-aware data fusion modeling. Additionally, a dissimilarity metric based on the latent variables of LVGP is introduced to study and understand the differences in the sources of data. The proposed approach is demonstrated on and analyzed through two mathematical and two materials science case studies. From the case studies, it is observed that compared to using single-source and source unaware machine learning models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems.

View on arXiv PDF

Similar