CHEM-PHLGNov 29, 2024

OpenQDC: Open Quantum Data Commons

MILA
arXiv:2411.19629v11 citationsh-index: 10Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of inaccessible and fragmented QM datasets for researchers developing MLIPs, facilitating collaboration and innovation in molecular dynamics simulations.

The authors tackled the fragmentation of quantum-mechanical datasets for machine learning interatomic potentials by introducing OpenQDC, a consolidated resource with 37 datasets from over 250 quantum methods and 400 million geometries, which revealed challenges for existing architectures and established a leaderboard for benchmarking.

Machine Learning Interatomic Potentials (MLIPs) are a highly promising alternative to force-fields for molecular dynamics (MD) simulations, offering precise and rapid energy and force calculations. However, Quantum-Mechanical (QM) datasets, crucial for MLIPs, are fragmented across various repositories, hindering accessibility and model development. We introduce the openQDC package, consolidating 37 QM datasets from over 250 quantum methods and 400 million geometries into a single, accessible resource. These datasets are meticulously preprocessed, and standardized for MLIP training, covering a wide range of chemical elements and interactions relevant in organic chemistry. OpenQDC includes tools for normalization and integration, easily accessible via Python. Experiments with well-known architectures like SchNet, TorchMD-Net, and DimeNet reveal challenges for those architectures and constitute a leaderboard to accelerate benchmarking and guide novel algorithms development. Continuously adding datasets to OpenQDC will democratize QM dataset access, foster more collaboration and innovation, enhance MLIP development, and support their adoption in the MD field.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes