From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures

Ryan Liu, Eric Qu, Tobias Kreiman, Samuel M. Blau, Aditi S. Krishnapriyan

arXiv:2602.04861v12.71 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the challenge of evaluating and designing MLIPs for computational chemistry and materials science, offering a more efficient metric to detect physical artifacts, though it is incremental as it builds on existing MLIP frameworks.

The paper tackled the problem of Machine Learning Interatomic Potentials (MLIPs) failing to reproduce physical smoothness in potential energy surfaces, which standard evaluations miss, by introducing the Bond Smoothness Characterization Test (BSCT) as an efficient benchmark; the result showed that BSCT correlates strongly with molecular dynamics stability while requiring a fraction of the cost, and guided model refinements to achieve low regression error, stable simulations, and robust predictions.

Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks.

View on arXiv PDF

Similar