Outlier-Detection for Reactive Machine Learned Potential Energy Surfaces
This work addresses the problem of improving accuracy in machine-learned potential energy surfaces for computational chemistry, though it is incremental as it applies existing methods to a specific domain.
The study tackled outlier detection in reactive molecular potential energy surfaces by comparing three uncertainty quantification methods—Ensembles, Deep Evidential Regression, and Gaussian Mixture Models—on an H-transfer reaction, finding that ensembles achieved up to 90% detection quality for outliers, while DER performed poorly due to statistical limitations.
Uncertainty quantification (UQ) to detect samples with large expected errors (outliers) is applied to reactive molecular potential energy surfaces (PESs). Three methods - Ensembles, Deep Evidential Regression (DER), and Gaussian Mixture Models (GMM) - were applied to the H-transfer reaction between ${\it syn-}$Criegee and vinyl hydroxyperoxide. The results indicate that ensemble models provide the best results for detecting outliers, followed by GMM. For example, from a pool of 1000 structures with the largest uncertainty, the detection quality for outliers is $\sim 90$ \% and $\sim 50$ \%, respectively, if 25 or 1000 structures with large errors are sought. On the contrary, the limitations of the statistical assumptions of DER greatly impacted its prediction capabilities. Finally, a structure-based indicator was found to be correlated with large average error, which may help to rapidly classify new structures into those that provide an advantage for refining the neural network.