CHEM-PHLGMLJun 20, 2024

$\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

arXiv:2406.14347v27 citations
Originality Incremental advance
AI Analysis

This work addresses the scalability problem in computational quantum chemistry for drug discovery by providing a comprehensive dataset and benchmark, though it is incremental as it builds upon existing datasets like nablaDFT.

The authors tackled the need for large, diverse datasets to train neural network potentials (NNPs) in quantum chemistry by introducing $ abla^2$DFT, a dataset with twice as many molecular structures, three times more conformations, and new data types like relaxation trajectories for drug-like molecules, and they proposed a benchmark and framework for evaluating NNPs, achieving state-of-the-art models in tasks such as molecular property prediction and conformational optimization.

Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called $\nabla^2$DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level ($ω$B97X-D/def2-SVP) for each conformation. Moreover, $\nabla^2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes