Amit Samanta

LG
h-index22
3papers
4citations
Novelty52%
AI Score30

3 Papers

LGFeb 1, 2024
LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields

Joshua A. Vita, Amit Samanta, Fei Zhou et al.

Model ensembles are effective tools for estimating prediction uncertainty in deep learning atomistic force fields. However, their widespread adoption is hindered by high computational costs and overconfident error estimates. In this work, we address these challenges by leveraging distributions of per-sample errors obtained during training and employing a distance-based similarity search in the model latent space. Our method, which we call LTAU, efficiently estimates the full probability distribution function (PDF) of errors for any test point using the logged training errors, achieving speeds that are 2--3 orders of magnitudes faster than typical ensemble methods and allowing it to be used for tasks where training or evaluating multiple models would be infeasible. We apply LTAU towards estimating parametric uncertainty in atomistic force fields (LTAU-FF), demonstrating that its improved ensemble diversity produces well-calibrated confidence intervals and predicts errors that correlate strongly with the true errors for data near the training domain. Furthermore, we show that the errors predicted by LTAU-FF can be used in practical applications for detecting out-of-domain data, tuning model performance, and predicting failure during simulations. We believe that LTAU will be a valuable tool for uncertainty quantification (UQ) in atomistic force fields and is a promising method that should be further explored in other domains of machine learning.

LGSep 15, 2025
Unsupervised Atomic Data Mining via Multi-Kernel Graph Autoencoders for Machine Learning Force Fields

Hong Sun, Joshua A. Vita, Amit Samanta et al.

Constructing a chemically diverse dataset while avoiding sampling bias is critical to training efficient and generalizable force fields. However, in computational chemistry and materials science, many common dataset generation techniques are prone to oversampling regions of the potential energy surface. Furthermore, these regions can be difficult to identify and isolate from each other or may not align well with human intuition, making it challenging to systematically remove bias in the dataset. While traditional clustering and pruning (down-sampling) approaches can be useful for this, they can often lead to information loss or a failure to properly identify distinct regions of the potential energy surface due to difficulties associated with the high dimensionality of atomic descriptors. In this work, we introduce the Multi-kernel Edge Attention-based Graph Autoencoder (MEAGraph) model, an unsupervised approach for analyzing atomic datasets. MEAGraph combines multiple linear kernel transformations with attention-based message passing to capture geometric sensitivity and enable effective dataset pruning without relying on labels or extensive training. Demonstrated applications on niobium, tantalum, and iron datasets show that MEAGraph efficiently groups similar atomic environments, allowing for the use of basic pruning techniques for removing sampling bias. This approach provides an effective method for representation learning and clustering that can be used for data analysis, outlier detection, and dataset optimization.

MTRL-SCIApr 27, 2025
Composable and adaptive design of machine learning interatomic potentials guided by Fisher-information analysis

Weishi Wang, Mark K. Transtrum, Vincenzo Lordi et al.

An adaptive physics-informed model design strategy for machine-learning interatomic potentials (MLIPs) is proposed. This strategy follows an iterative reconfiguration of composite models from single-term models, followed by a unified training procedure. A model evaluation method based on the Fisher information matrix (FIM) and multiple-property error metrics is proposed to guide model reconfiguration and hyperparameter optimization. Combining the model reconfiguration and the model evaluation subroutines, we provide an adaptive MLIP design strategy that balances flexibility and extensibility. In a case study of designing models against a structurally diverse niobium dataset, we managed to obtain an optimal configuration with 75 parameters generated by our framework that achieved a force RMSE of 0.172 eV/Å and an energy RMSE of 0.013 eV/atom.