Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework
This work addresses the problem of limited multimodal learning in materials science for researchers, offering a new dataset and framework to advance benchmarking and annotation practices, though it appears incremental as it builds on existing data and methods.
The paper tackles the limitation of materials science datasets being restricted to atomic geometries by introducing MultiCrystalSpectrumSet (MCS-Set), a curated framework that integrates atomic structures with 2D projections and textual annotations, enabling multimodal property prediction and constrained crystal generation with partial supervision.
Most materials science datasets are limited to atomic geometries (e.g., XYZ files), restricting their utility for multimodal learning and comprehensive data-centric analysis. These constraints have historically impeded the adoption of advanced machine learning techniques in the field. This work introduces MultiCrystalSpectrumSet (MCS-Set), a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations, including lattice parameters and coordination metrics. MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision. Leveraging a human-in-the-loop pipeline, MCS-Set combines domain expertise with standardized descriptors for high-quality annotation. Evaluations using state-of-the-art language and vision-language models reveal substantial modality-specific performance gaps and highlight the importance of annotation quality for generalization. MCS-Set offers a foundation for benchmarking multimodal models, advancing annotation practices, and promoting accessible, versatile materials science datasets. The dataset and implementations are available at https://github.com/KurbanIntelligenceLab/MultiCrystalSpectrumSet.