CHEM-PHOct 28, 2023
The Role of Reference Points in Machine-Learned Atomistic Simulation ModelsXiangyun Lei, Weike Ye, Joseph Montoya et al.
This paper introduces the Chemical Environment Modeling Theory (CEMT), a novel, generalized framework designed to overcome the limitations inherent in traditional atom-centered Machine Learning Force Field (MLFF) models, widely used in atomistic simulations of chemical systems. CEMT demonstrated enhanced flexibility and adaptability by allowing reference points to exist anywhere within the modeled domain and thus, enabling the study of various model architectures. Utilizing Gaussian Multipole (GMP) featurization functions, several models with different reference point sets, including finite difference grid-centered and bond-centered models, were tested to analyze the variance in capabilities intrinsic to models built on distinct reference points. The results underscore the potential of non-atom-centered reference points in force training, revealing variations in prediction accuracy, inference speed and learning efficiency. Finally, a unique connection between CEMT and real-space orbital-free finite element Density Functional Theory (FE-DFT) is established, and the implications include the enhancement of data efficiency and robustness. It allows the leveraging of spatially-resolved energy densities and charge densities from FE-DFT calculations, as well as serving as a pivotal step towards integrating known quantum-mechanical laws into the architecture of ML models.
LGNov 13, 2024
UniMat: Unifying Materials Embeddings through Multi-modal LearningJanghoon Ock, Joseph Montoya, Daniel Schweigert et al.
Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal learning, particularly in vision and language models, have opened new avenues for integrating data in different forms. In this work, we evaluate common techniques in multi-modal learning (alignment and fusion) in unifying some of the most important modalities in materials science: atomic structure, X-ray diffraction patterns (XRD), and composition. We show that structure graph modality can be enhanced by aligning with XRD patterns. Additionally, we show that aligning and fusing more experimentally accessible data formats, such as XRD patterns and compositions, can create more robust joint embeddings than individual modalities across various tasks. This lays the groundwork for future studies aiming to exploit the full potential of multi-modal data in materials science, facilitating more informed decision-making in materials design and discovery.
LGJun 27, 2025
XxaCT-NN: Structure Agnostic Multimodal Learning for Materials ScienceJithendaraa Subramanian, Linda Hung, Daniel Schweigert et al.
Recent advances in materials discovery have been driven by structure-based models, particularly those using crystal graphs. While effective for computational datasets, these models are impractical for real-world applications where atomic structures are often unknown or difficult to obtain. We propose a scalable multimodal framework that learns directly from elemental composition and X-ray diffraction (XRD) -- two of the more available modalities in experimental workflows without requiring crystal structure input. Our architecture integrates modality-specific encoders with a cross-attention fusion module and is trained on the 5-million-sample Alexandria dataset. We present masked XRD modeling (MXM), and apply MXM and contrastive alignment as self-supervised pretraining strategies. Pretraining yields faster convergence (up to 4.2x speedup) and improves both accuracy and representation quality. We further demonstrate that multimodal performance scales more favorably with dataset size than unimodal baselines, with gains compounding at larger data regimes. Our results establish a path toward structure-free, experimentally grounded foundation models for materials science.