Peiyuan Gao

CE
h-index8
4papers
27citations
Novelty43%
AI Score24

4 Papers

NAMar 18, 2019
A data-driven framework for sparsity-enhanced surrogates with arbitrary mutually dependent randomness

Huan Lei, Jing Li, Peiyuan Gao et al.

The challenge of quantifying uncertainty propagation in real-world systems is rooted in the high-dimensionality of the stochastic input and the frequent lack of explicit knowledge of its probability distribution. Traditional approaches show limitations for such problems. To address these difficulties, we have developed a general framework of constructing surrogate models on spaces of stochastic input with arbitrary probability measure irrespective of the mutual dependencies between individual components and the analytical form. The present Data-driven Sparsity-enhancing Rotation for Arbitrary Randomness (DSRAR) framework includes a data-driven construction of multivariate polynomial basis for arbitrary mutually dependent probability measure and a sparsity enhancement rotation procedure. This sparsity-enhancing rotation method was initially proposed in our previous work [1] for Gaussian distributions, which may not be feasible for non-Gaussian distributions due to the loss of orthogonality after the rotation. To remedy such difficulties, we developed the new approach to construct orthonormal polynomials for arbitrary mutually dependent (amdP) randomness, ensuring the constructed basis maintains the orthogonality with respect to the density of the rotated random vector, where directly applying the regular polynomial chaos including arbitrary polynomial chaos (aPC) [2] shows limitations due to the assumption of the mutual independence between the components of the random inputs. The developed DSRAR framework leads to accurate recovery of a sparse representation of the target functions. The effectiveness of our method is demonstrated in challenging problems such as PDEs and realistic molecular systems where the underlying density is implicitly represented by a large collection of sample data, as well as systems with explicitly given non-Gaussian probabilistic measures.

LGOct 16, 2024
FragNet: A Graph Neural Network for Molecular Property Prediction with Four Levels of Interpretability

Gihan Panapitiya, Peiyuan Gao, C Mark Maupin et al.

Molecular property prediction is essential in a variety of contemporary scientific fields, such as drug development and designing energy storage materials. Although there are many machine learning models available for this purpose, those that achieve high accuracy while also offering interpretability of predictions are uncommon. We present a graph neural network that not only matches the prediction accuracies of leading models but also provides insights on four levels of molecular substructures. This model helps identify which atoms, bonds, molecular fragments, and connections between fragments are significant for predicting a specific molecular property. Understanding the importance of connections between fragments is particularly valuable for molecules with substructures that do not connect through standard bonds. The model additionally can quantify the impact of specific fragments on the prediction, allowing the identification of fragments that may improve or degrade a property value. These interpretable features are essential for deriving scientific insights from the model's learned relationships between molecular structures and properties.

COMP-PHSep 2, 2023
Physics-informed machine learning of the correlation functions in bulk fluids

Wenqian Chen, Peiyuan Gao, Panos Stinis

The Ornstein-Zernike (OZ) equation is the fundamental equation for pair correlation function computations in the modern integral equation theory for liquids. In this work, machine learning models, notably physics-informed neural networks and physics-informed neural operator networks, are explored to solve the OZ equation. The physics-informed machine learning models demonstrate great accuracy and high efficiency in solving the forward and inverse OZ problems of various bulk fluids. The results highlight the significant potential of physics-informed machine learning for applications in thermodynamic state theory.

CEJun 15, 2021
Graphical Gaussian Process Regression Model for Aqueous Solvation Free Energy Prediction of Organic Molecules in Redox Flow Battery

Peiyuan Gao, Xiu Yang, Yu-Hang Tang et al.

The solvation free energy of organic molecules is a critical parameter in determining emergent properties such as solubility, liquid-phase equilibrium constants, and pKa and redox potentials in an organic redox flow battery. In this work, we present a machine learning (ML) model that can learn and predict the aqueous solvation free energy of an organic molecule using Gaussian process regression method based on a new molecular graph kernel. To investigate the performance of the ML model on electrostatic interaction, the nonpolar interaction contribution of solvent and the conformational entropy of solute in solvation free energy, three data sets with implicit or explicit water solvent models, and contribution of conformational entropy of solute are tested. We demonstrate that our ML model can predict the solvation free energy of molecules at chemical accuracy with a mean absolute error of less than 1 kcal/mol for subsets of the QM9 dataset and the Freesolv database. To solve the general data scarcity problem for a graph-based ML model, we propose a dimension reduction algorithm based on the distance between molecular graphs, which can be used to examine the diversity of the molecular data set. It provides a promising way to build a minimum training set to improve prediction for certain test sets where the space of molecular structures is predetermined.