Data-driven path collective variables
This work addresses the problem of enhancing sampling efficiency in molecular simulations for researchers in computational chemistry and materials science, representing an incremental advancement in collective variable design.
The authors tackled the challenge of identifying optimal collective variables for atomic-scale simulations by proposing a data-driven method that generalizes path collective variables, using kernel ridge regression of committor probability to create interpretable, differentiable variables; they demonstrated its validity on a precipitation model and Li⁺-F⁻ association in water, showing improved accuracy over simpler variables and insights into solvation effects.
Identifying optimal collective variables to model transformations, using atomic-scale simulations, is a long-standing challenge. We propose a new method for the generation, optimization, and comparison of collective variables, which can be thought of as a data-driven generalization of the path collective variable concept. It consists in a kernel ridge regression of the committor probability, which encodes a transformation's progress. The resulting collective variable is one-dimensional, interpretable, and differentiable, making it appropriate for enhanced sampling simulations requiring biasing. We demonstrate the validity of the method on two different applications: a precipitation model, and the association of Li$^+$ and F$^-$ in water. For the former, we show that global descriptors such as the permutation invariant vector allow to reach an accuracy far from the one achieved \textit{via} simpler, more intuitive variables. For the latter, we show that information correlated with the transformation mechanism is contained in the first solvation shell only, and that inertial effects prevent the derivation of optimal collective variables from the atomic positions only.