MTRL-SCIDec 5, 2022
Score-based denoising for atomic structure identificationTim Hsu, Babak Sadigh, Nicolas Bertin et al.
We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly reveal underlying crystal order while retaining disorder associated with crystal defects. Purely geometric, agnostic to interatomic potentials, and trained without inputs from explicit simulations, our denoiser can be applied to simulation data generated from vastly different interatomic interactions. The denoiser is shown to improve existing classification methods such as common neighbor analysis and polyhedral template matching, reaching perfect classification accuracy on a recent benchmark dataset of thermally perturbed structures up to the melting point. Demonstrated here in a wide variety of atomistic simulation contexts, the denoiser is general, robust, and readily extendable to delineate order from disorder in structurally and chemically complex materials.
COMP-PHOct 2, 2023
Score dynamics: scaling molecular dynamics with picoseconds timestep via conditional diffusion modelTim Hsu, Babak Sadigh, Vasily Bulatov et al.
We propose score dynamics (SD), a general framework for learning accelerated evolution operators with large timesteps from molecular-dynamics simulations. SD is centered around scores, or derivatives of the transition log-probability with respect to the dynamical degrees of freedom. The latter play the same role as force fields in MD but are used in denoising diffusion probability models to generate discrete transitions of the dynamical variables in an SD timestep, which can be orders of magnitude larger than a typical MD timestep. In this work, we construct graph neural network based score dynamics models of realistic molecular systems that are evolved with 10~ps timesteps. We demonstrate the efficacy of score dynamics with case studies of alanine dipeptide and short alkanes in aqueous solution. Both equilibrium predictions derived from the stationary distributions of the conditional probability and kinetic predictions for the transition rates and transition paths are in good agreement with MD. Our current SD implementation is about two orders of magnitude faster than the MD counterpart for the systems studied in this work. Open challenges and possible future remedies to improve score dynamics are also discussed.
MTRL-SCIAug 28, 2024
Grand canonical generative diffusion model for crystalline phases and grain boundariesBo Lei, Enze Chen, Hyuna Kwon et al.
The diffusion model has emerged as a powerful tool for generating atomic structures for materials science. This work calls attention to the deficiency of current particle-based diffusion models, which represent atoms as a point cloud, in generating even the simplest ordered crystalline structures. The problem is attributed to particles being trapped in local minima during the score-driven simulated annealing of the diffusion process, similar to the physical process of force-driven simulated annealing. We develop a solution, the grand canonical diffusion model, which adopts an alternative voxel-based representation with continuous rather than fixed number of particles. The method is applied towards generation of several common crystalline phases as well as the technologically important and challenging problem of grain boundary structures.
MTRL-SCIApr 18, 2024
Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theoryDaniel Schwalbe-Koda, Sebastien Hamel, Babak Sadigh et al.
An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.
MTRL-SCIDec 11, 2025
A probabilistic foundation model for crystal structure denoising, phase classification, and order parametersHyuna Kwon, Babak Sadigh, Sebastien Hamel et al.
Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and interpretable remains challenging. Existing tools such as PTM and CNA are restricted to a small set of hand-crafted lattices (e.g.\ FCC/BCC/HCP), degrade under strong thermal disorder or defects, and produce hard, template-based labels without per-atom probability or confidence scores. Here we introduce a log-probability foundation model that unifies denoising, phase classification, and OP extraction within a single probabilistic framework. We reuse the MACE-MP foundation interatomic potential on crystal structures mapped to AFLOW prototypes, training it to predict per-atom, per-phase logits $l$ and to aggregate them into a global log-density $\log \hat{P}_θ(\boldsymbol{r})$ whose gradient defines a conservative score field. Denoising corresponds to gradient ascent on this learned log-density, phase labels follow from $\arg\max_c l_{ac}$, and the $l$ values act as continuous, defect-sensitive and interpretable OPs quantifying the Euclidean distance to ideal phases. We demonstrate universality across hundreds of prototypes, robustness under strong thermal and defect-induced disorder, and accurate treatment of complex systems such as ice polymorphs, ice--water interfaces, and shock-compressed Ti.