LGMTRL-SCIApr 12, 2025

MatWheel: Addressing Data Scarcity in Materials Science Through Synthetic Data

arXiv:2504.09152v11 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This addresses data scarcity for materials science researchers, but it is incremental as it applies existing generative methods to a new domain.

The paper tackles data scarcity in materials science by proposing the MatWheel framework, which uses synthetic data from a conditional generative model to train property prediction models, achieving performance close to or exceeding real samples in two data-scarce datasets.

Data scarcity and the high cost of annotation have long been persistent challenges in the field of materials science. Inspired by its potential in other fields like computer vision, we propose the MatWheel framework, which train the material property prediction model using the synthetic data generated by the conditional generative model. We explore two scenarios: fully-supervised and semi-supervised learning. Using CGCNN for property prediction and Con-CDVAE as the conditional generative model, experiments on two data-scarce material property datasets from Matminer database are conducted. Results show that synthetic data has potential in extreme data-scarce scenarios, achieving performance close to or exceeding that of real samples in all two tasks. We also find that pseudo-labels have little impact on generated data quality. Future work will integrate advanced models and optimize generation conditions to boost the effectiveness of the materials data flywheel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes