Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials
This work addresses the need for efficient 3D geometry generation in computational chemistry, offering a method to reduce reliance on expensive DFT calculations, though it is incremental as it builds on existing MLIP approaches.
The paper tackles the problem of obtaining accurate 3D molecular geometries for property predictions by using machine learning interatomic potential (MLIP) models trained on a large-scale dataset of 3.5 million molecules and 300 million snapshots, resulting in improved downstream property predictions.
Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million snapshots. Then MLIP foundation models are trained with supervised learning to predict energy and forces given 3D molecular structures. Once trained, we show that the foundation models can be used in different ways to obtain geometries either explicitly or implicitly. First, it can be used to obtain low-energy 3D geometries via geometry optimization, providing relaxed 3D geometries for downstream molecular property predictions. To mitigate potential biases and enhance downstream predictions, we introduce geometry fine-tuning based on the relaxed 3D geometries. Second, the foundation models can be directly fine-tuned for property prediction when ground truth 3D geometries are available. Our results demonstrate that MLIP foundation models trained on relaxation data can provide valuable molecular geometries that benefit property predictions.