XxaCT-NN: Structure Agnostic Multimodal Learning for Materials Science
This addresses the challenge for materials scientists where atomic structures are often unavailable, enabling more practical real-world applications.
The paper tackles the problem of materials discovery by proposing a multimodal framework that learns from elemental composition and X-ray diffraction without requiring crystal structure input, achieving up to 4.2x faster convergence and improved accuracy and representation quality.
Recent advances in materials discovery have been driven by structure-based models, particularly those using crystal graphs. While effective for computational datasets, these models are impractical for real-world applications where atomic structures are often unknown or difficult to obtain. We propose a scalable multimodal framework that learns directly from elemental composition and X-ray diffraction (XRD) -- two of the more available modalities in experimental workflows without requiring crystal structure input. Our architecture integrates modality-specific encoders with a cross-attention fusion module and is trained on the 5-million-sample Alexandria dataset. We present masked XRD modeling (MXM), and apply MXM and contrastive alignment as self-supervised pretraining strategies. Pretraining yields faster convergence (up to 4.2x speedup) and improves both accuracy and representation quality. We further demonstrate that multimodal performance scales more favorably with dataset size than unimodal baselines, with gains compounding at larger data regimes. Our results establish a path toward structure-free, experimentally grounded foundation models for materials science.