MLCEBMFeb 28, 2018

Automated design of collective variables using supervised machine learning

arXiv:1802.10510v2131 citations
Originality Highly original
AI Analysis

This addresses a key bottleneck in computational biophysics for researchers, offering a novel method to automate CV selection, though it is incremental as it builds on existing ML techniques.

The paper tackles the challenge of selecting initial collective variables for enhanced sampling in molecular simulations by proposing a data-driven approach using supervised machine learning decision functions as CVs, demonstrating its effectiveness in reversibly sampling slow structural transitions for test cases like alanine dipeptide and Chignolin mini-protein.

Selection of appropriate collective variables for enhancing sampling of molecular simulations remains an unsolved problem in computational biophysics. In particular, picking initial collective variables (CVs) is particularly challenging in higher dimensions. Which atomic coordinates or transforms there of from a list of thousands should one pick for enhanced sampling runs? How does a modeler even begin to pick starting coordinates for investigation? This remains true even in the case of simple two state systems and only increases in difficulty for multi-state systems. In this work, we solve the initial CV problem using a data-driven approach inspired by the filed of supervised machine learning. In particular, we show how the decision functions in supervised machine learning (SML) algorithms can be used as initial CVs (SML_cv) for accelerated sampling. Using solvated alanine dipeptide and Chignolin mini-protein as our test cases, we illustrate how the distance to the Support Vector Machines' decision hyperplane, the output probability estimates from Logistic Regression, the outputs from deep neural network classifiers, and other classifiers may be used to reversibly sample slow structural transitions. We discuss the utility of other SML algorithms that might be useful for identifying CVs for accelerating molecular simulations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes