CHEM-PHLGNov 5, 2023

Gradual Optimization Learning for Conformational Energy Minimization

arXiv:2311.06295v26 citationsh-index: 13
Originality Highly original
AI Analysis

This work addresses a computational bottleneck in drug discovery and materials design, offering a significant reduction in data requirements for neural network-based optimization.

The paper tackles the problem of accelerating molecular conformation energy minimization by reducing the data needed to train neural networks that replace expensive physical simulators, achieving performance on par with the oracle using 50x less additional data.

Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to accelerate this procedure is to replace the physical simulator with a neural network. Despite recent progress in neural networks for molecular conformation energy prediction, such models are prone to distribution shift, leading to inaccurate energy minimization. We find that the quality of energy minimization with neural networks can be improved by providing optimization trajectories as additional training data. Still, it takes around $5 \times 10^5$ additional conformations to match the physical simulator's optimization quality. In this work, we present the Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks that significantly reduces the required additional data. The framework consists of an efficient data-collecting scheme and an external optimizer. The external optimizer utilizes gradients from the energy prediction model to generate optimization trajectories, and the data-collecting scheme selects additional training data to be processed by the physical simulator. Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules using $50$x less additional data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes