CHEM-PH LG COMP-PHMay 18, 2023

Multi-Fidelity Machine Learning for Excited State Energies of Molecules

Vivin Vinod, Sayan Maity, Peter Zaspel, Ulrich Kleinekathöfer

arXiv:2305.11292v13.314 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of computational cost in generating training data for molecular excited state predictions, which is important for chemists and materials scientists, though it is incremental as it builds on existing multi-fidelity methods.

The paper tackled the challenge of accurately predicting molecular excited state energies by proposing a multi-fidelity machine learning approach that combines limited high-accuracy data with cheaper, less accurate data, achieving the same accuracy as models using only high-cost data with over a 30-fold reduction in computational effort.

The accurate but fast calculation of molecular excited states is still a very challenging topic. For many applications, detailed knowledge of the energy funnel in larger molecular aggregates is of key importance requiring highly accurate excited state energies. To this end, machine learning techniques can be an extremely useful tool though the cost of generating highly accurate training datasets still remains a severe challenge. To overcome this hurdle, this work proposes the use of multi-fidelity machine learning where very little training data from high accuracies is combined with cheaper and less accurate data to achieve the accuracy of the costlier level. In the present study, the approach is employed to predict the first excited state energies for three molecules of increasing size, namely, benzene, naphthalene, and anthracene. The energies are trained and tested for conformations stemming from classical molecular dynamics simulations and from real-time density functional tight-binding calculations. It can be shown that the multi-fidelity machine learning model can achieve the same accuracy as a machine learning model built only on high cost training data while having a much lower computational effort to generate the data. The numerical gain observed in these benchmark test calculations was over a factor of 30 but certainly can be much higher for high accuracy data.

View on arXiv PDF

Similar