CHEM-PHLGJun 20, 2024

QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules

arXiv:2406.14149v310 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers in machine learning and quantum chemistry to benchmark multifidelity models, though it is incremental as it addresses a data gap rather than proposing new methods.

The authors tackled the lack of diverse multifidelity datasets for benchmarking machine learning models in quantum chemistry by introducing the QeMFi dataset, which includes five fidelities with properties like excitation energies and dipole moments, along with computation times for time-benefit analysis.

Progress in both Machine Learning (ML) and Quantum Chemistry (QC) methods have resulted in high accuracy ML models for QC properties. Datasets such as MD17 and WS22 have been used to benchmark these models at some level of QC method, or fidelity, which refers to the accuracy of the chosen QC method. Multifidelity ML (MFML) methods, where models are trained on data from more than one fidelity, have shown to be effective over single fidelity methods. Much research is progressing in this direction for diverse applications ranging from energy band gaps to excitation energies. One hurdle for effective research here is the lack of a diverse multifidelity dataset for benchmarking. We provide the Quantum chemistry MultiFidelity (QeMFi) dataset consisting of five fidelities calculated with the TD-DFT formalism. The fidelities differ in their basis set choice: STO-3G, 3-21G, 6-31G, def2-SVP, and def2-TZVP. QeMFi offers to the community a variety of QC properties such as vertical excitation properties and molecular dipole moments, further including QC computation times allowing for a time benefit benchmark of multifidelity models for ML-QC.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes