CHEM-PHMTRL-SCILGMar 20, 2024

Considerations in the use of ML interaction potentials for free energy calculations

arXiv:2403.13952v33 citationsh-index: 22J Chem Phys
Originality Synthesis-oriented
AI Analysis

This work addresses challenges in creating training datasets for machine learning force fields to predict free energy surfaces in computational chemistry, which is incremental as it builds on existing EQNN methods.

The study investigated how the distribution of collective variables in training data affects the accuracy of equivariant graph neural networks in predicting free energy surfaces for butane and alanine dipeptide, finding that accuracy is maintained if training includes characteristic FES regions but extrapolation fails with classical simulation data, while ab initio data improves it.

Machine learning force fields (MLFFs) promise to accurately describe the potential energy surface of molecules at the ab initio level of theory with improved computational efficiency. Within MLFFs, equivariant graph neural networks (EQNNs) have shown great promise in accuracy and performance and are the focus of this work. The capability of EQNNs to recover free energy surfaces (FES) remains to be thoroughly investigated. In this work, we investigate the impact of collective variables (CVs) distribution within the training data on the accuracy of EQNNs predicting the FES of butane and alanine dipeptide (ADP). A generalizable workflow is presented in which training configurations are generated with classical molecular dynamics simulations, and energies and forces are obtained with ab initio calculations. We evaluate how bond and angle constraints in the training data influence the accuracy of EQNN force fields in reproducing the FES of the molecules at both classical and ab initio levels of theory. Results indicate that the model's accuracy is unaffected by the distribution of sampled CVs during training, given that the training data includes configurations from characteristic regions of the system's FES. However, when the training data is obtained from classical simulations, the EQNN struggles to extrapolate the free energy for configurations with high free energy. In contrast, models trained with the same configurations on ab initio data show improved extrapolation accuracy. The findings underscore the difficulties in creating a comprehensive training dataset for EQNNs to predict FESs and highlight the importance of prior knowledge of the system's FES.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes