MLMar 11
Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point SearchesRohit Goswami
Accelerating the explorations of stationary points on potential energy surfaces building local surrogates spans decades of effort. Done correctly, surrogates reduce required evaluations by an order of magnitude while preserving the accuracy of the underlying theory. We present a unified Bayesian Optimization view of minimization, single point saddle searches, and double ended saddle searches through a unified six-step surrogate loop, differing only in the inner optimization target and acquisition criterion. The framework uses Gaussian process regression with derivative observations, inverse-distance kernels, and active learning. The Optimal Transport GP extensions of farthest point sampling with Earth mover's distance, MAP regularization via variance barrier and oscillation detection, and adaptive trust radius form concrete extensions of the same basic methodology, improving accuracy and efficiency. We also demonstrate random Fourier features decouple hyperparameter training from predictions enabling favorable scaling for high-dimensional systems. Accompanying pedagogical Rust code demonstrates that all applications use the exact same Bayesian optimization loop, bridging the gap between theoretical formulation and practical execution.
CHEM-PHDec 8, 2025
Two-dimensional RMSD projections for reaction path visualization and validationRohit Goswami
Transition state or minimum energy path finding methods constitute a routine component of the computational chemistry toolkit. Standard analysis involves trajectories conventionally plotted in terms of the relative energy to the initial state against a cumulative displacement variable, or the image number. These dimensional reductions obscure structural rearrangements in high dimensions and may often be trajectory dependent. This precludes the ability to compare optimization trajectories of different methods beyond the number of calculations, time taken, and final saddle geometry. We present a method mapping trajectories onto a two-dimension surface defined by a permutation corrected root mean square deviation from the reactant and product configurations. Energy is represented as an interpolated color-mapped surface constructed from all optimization steps using radial basis functions. This representation highlights optimization trajectories, identifies endpoint basins, and diagnoses convergence concerns invisible in one-dimensional profiles. We validate the framework on a cycloaddition reaction, showing that a machine-learned potential saddle and density functional theory reference lie on comparable energy contours despite geometric displacements.
CHEM-PHJan 19
Reorienting off-path Nudged Elastic Bands (RONEB) via Minimum Mode FollowingRohit Goswami, Miha Gunde, Hannes Jónsson
Accurate determination of transition states remains central to understanding reaction kinetics. Double-ended methods like the Nudged Elastic Band (NEB) ensure relevant transition states and paths, but incur high computational costs and suffer stagnation on flat or rough potential energy surfaces. Conversely, single-ended eigenmode-following techniques offer efficiency but cannot often be constrained between specific states. Here, we present the Reorienting Off-path Nudged Elastic Bands (RONEB), an adaptive hybrid algorithm that integrates the double ended nature of the NEB with the acceleration of single ended Min-Mode Following methods. RONEB provides stability based on the history of the path optimization, relative force triggering, and an alignment-based back-off penalty to dynamically decouple the climbing image from the elastic band constraints. We benchmark the method against the standard Climbing Image NEB (CI-NEB) across the Baker-Chan transition state test set using the PET-MAD machine-learned potential and the OptBench Pt(111) heptamer island surface diffusion set. A Bayesian analysis of the performance data quantifies a median reduction in gradient calls of 46.3% [95% CrI: -54.7%, -36.9%] relative to the baseline, while surface diffusion tests reveal a 28% reduction across 59 metallic rearrangement mechanisms. These results establish RONEB as a highly effective tool for high-throughput automated chemical discovery.
CHEM-PHMay 18, 2025
Efficient Implementation of Gaussian Process Regression Accelerated Saddle Point Searches with Application to Molecular ReactionsRohit Goswami, Maxim Masterov, Satish Kamath et al.
The task of locating first order saddle points on high-dimensional surfaces describing the variation of energy as a function of atomic coordinates is an essential step for identifying the mechanism and estimating the rate of thermally activated events within the harmonic approximation of transition state theory. When combined directly with electronic structure calculations, the number of energy and atomic force evaluations needed for convergence is a primary issue. Here, we describe an efficient implementation of Gaussian process regression (GPR) acceleration of the minimum mode following method where a dimer is used to estimate the lowest eigenmode of the Hessian. A surrogate energy surface is constructed and updated after each electronic structure calculation. The method is applied to a test set of 500 molecular reactions previously generated by Hermez and coworkers [J. Chem. Theory Comput. 18, 6974 (2022)]. An order of magnitude reduction in the number of electronic structure calculations needed to reach the saddle point configurations is obtained by using the GPR compared to the dimer method. Despite the wide range in stiffness of the molecular degrees of freedom, the calculations are carried out using Cartesian coordinates and are found to require similar number of electronic structure calculations as an elaborate internal coordinate method implemented in the Sella software package. The present implementation of the GPR surrogate model in C++ is efficient enough for the wall time of the saddle point searches to be reduced in 3 out of 4 cases even though the calculations are carried out at a low Hartree-Fock level.
CHEM-PHOct 24, 2025
Efficient Exploration of Chemical KineticsRohit Goswami
Estimating reaction rates and chemical stability is fundamental, yet efficient methods for large-scale simulations remain out of reach despite advances in modeling and exascale computing. Direct simulation is limited by short timescales; machine-learned potentials require large data sets and struggle with transition state regions essential for reaction rates. Reaction network exploration with sufficient accuracy is hampered by the computational cost of electronic structure calculations, and even simplifications like harmonic transition state theory rely on prohibitively expensive saddle point searches. Surrogate model-based acceleration has been promising but hampered by overhead and numerical instability. This dissertation presents a holistic solution, co-designing physical representations, statistical models, and systems architecture in the Optimal Transport Gaussian Process (OT-GP) framework. Using physics-aware optimal transport metrics, OT-GP creates compact, chemically relevant surrogates of the potential energy surface, underpinned by statistically robust sampling. Alongside EON software rewrites for long timescale simulations, we introduce reinforcement learning approaches for both minimum-mode following (when the final state is unknown) and nudged elastic band methods (when endpoints are specified). Collectively, these advances establish a representation-first, modular approach to chemical kinetics simulation. Large-scale benchmarks and Bayesian hierarchical validation demonstrate state-of-the-art performance and practical exploration of chemical kinetics, transforming a longstanding theoretical promise into a working engine for discovery.
CHEM-PHOct 7, 2025
Adaptive Pruning for Increased Robustness and Reduced Computational Overhead in Gaussian Process Accelerated Saddle Point SearchesRohit Goswami, Hannes Jónsson
Gaussian process (GP) regression provides a strategy for accelerating saddle point searches on high-dimensional energy surfaces by reducing the number of times the energy and its derivatives with respect to atomic coordinates need to be evaluated. The computational overhead in the hyperparameter optimization can, however, be large and make the approach inefficient. Failures can also occur if the search ventures too far into regions that are not represented well enough by the GP model. Here, these challenges are resolved by using geometry-aware optimal transport measures and an active pruning strategy using a summation over Wasserstein-1 distances for each atom-type in farthest-point sampling, selecting a fixed-size subset of geometrically diverse configurations to avoid rapidly increasing cost of GP updates as more observations are made. Stability is enhanced by permutation-invariant metric that provides a reliable trust radius for early-stopping and a logarithmic barrier penalty for the growth of the signal variance. These physically motivated algorithmic changes prove their efficacy by reducing to less than a half the mean computational time on a set of 238 challenging configurations from a previously published data set of chemical reactions. With these improvements, the GP approach is established as, a robust and scalable algorithm for accelerating saddle point searches when the evaluation of the energy and atomic forces requires significant computational effort.