Metadynamics for Training Neural Network Model Chemistries: a Competitive Assessment
This addresses the issue of generalization for researchers using neural network model chemistries in computational chemistry, though it is incremental as it evaluates existing sampling methods.
The paper tackled the problem of poor generalization in neural network model chemistries due to inefficient training data sampling, showing that Metadynamics (MetaMD) is a cheap, black-box method that ensures samples explore new regions of chemical space while remaining relevant near k_bT, with cost scaling linearly with the number of atoms.
Neural network (NN) model chemistries (MCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and `test data' chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow `test error' can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling (NMS) and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near $k_bT$. It is one cheap tool to address the issue of generalization.