Simple and efficient algorithms for training machine learning potentials to force data
This work addresses the data efficiency problem for researchers in computational chemistry and materials science by providing a more efficient training method, though it appears incremental as it builds on existing force training approaches.
The authors tackled the computational cost of training machine learning potentials on atomic force data by introducing a new algorithm that is only a few times more expensive than training on energies alone, and benchmarked it on organic chemistry and bulk aluminum datasets.
Abstract Machine learning models, trained on data from ab initio quantum simulations, are yielding molecular dynamics potentials with unprecedented accuracy. One limiting factor is the quantity of available training data, which can be expensive to obtain. A quantum simulation often provides all atomic forces, in addition to the total energy of the system. These forces provide much more information than the energy alone. It may appear that training a model to this large quantity of force data would introduce significant computational costs. Actually, training to all available force data should only be a few times more expensive than training to energies alone. Here, we present a new algorithm for efficient force training, and benchmark its accuracy by training to forces from real-world datasets for organic chemistry and bulk aluminum.