COMP-PHLGCHEM-PHOct 13, 2022

Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations

MIT
arXiv:2210.07237v2213 citationsh-index: 109Has Code
AI Analysis

This work addresses a critical gap for researchers in computational chemistry and materials science by providing a more practical evaluation framework, though it is incremental as it builds on existing benchmarking practices.

The authors tackled the problem that machine learning force fields are typically benchmarked only on force/energy prediction errors, which may not align with realistic molecular dynamics simulation outcomes, by introducing a novel benchmark suite that evaluates state-of-the-art models on curated systems like water and peptides, showing that force accuracy often fails to correlate with relevant simulation metrics and identifying stability as a key area for improvement.

Molecular dynamics (MD) simulation techniques are widely used for various natural science applications. Increasingly, machine learning (ML) force field (FF) models begin to replace ab-initio simulations by predicting forces directly from atomic structures. Despite significant progress in this area, such techniques are primarily benchmarked by their force/energy prediction errors, even though the practical use case would be to produce realistic MD trajectories. We aim to fill this gap by introducing a novel benchmark suite for learned MD simulation. We curate representative MD systems, including water, organic molecules, a peptide, and materials, and design evaluation metrics corresponding to the scientific objectives of respective systems. We benchmark a collection of state-of-the-art (SOTA) ML FF models and illustrate, in particular, how the commonly benchmarked force accuracy is not well aligned with relevant simulation metrics. We demonstrate when and how selected SOTA methods fail, along with offering directions for further improvement. Specifically, we identify stability as a key metric for ML models to improve. Our benchmark suite comes with a comprehensive open-source codebase for training and simulation with ML FFs to facilitate future work.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes