chemtrain: Learning Deep Potential Models via Automatic Differentiation and Statistical Physics
This work addresses the challenge of costly reference data generation and data inefficiency in molecular dynamics simulations for researchers in computational chemistry and materials science, representing an incremental improvement through customizable training routines.
The paper tackles the problem of efficiently training neural network potential models for molecular dynamics by introducing the chemtrain framework, which combines multiple top-down and bottom-up algorithms to incorporate diverse data sources, demonstrating its utility in parametrizing models for titanium and alanine dipeptide.
Neural Networks (NNs) are effective models for refining the accuracy of molecular dynamics, opening up new fields of application. Typically trained bottom-up, atomistic NN potential models can reach first-principle accuracy, while coarse-grained implicit solvent NN potentials surpass classical continuum solvent models. However, overcoming the limitations of costly generation of accurate reference data and data inefficiency of common bottom-up training demands efficient incorporation of data from many sources. This paper introduces the framework chemtrain to learn sophisticated NN potential models through customizable training routines and advanced training algorithms. These routines can combine multiple top-down and bottom-up algorithms, e.g., to incorporate both experimental and simulation data or pre-train potentials with less costly algorithms. chemtrain provides an object-oriented high-level interface to simplify the creation of custom routines. On the lower level, chemtrain relies on JAX to compute gradients and scale the computations to use available resources. We demonstrate the simplicity and importance of combining multiple algorithms in the examples of parametrizing an all-atomistic model of titanium and a coarse-grained implicit solvent model of alanine dipeptide.