Benchmarking bandgap prediction in semiconductors under experimental and realistic evaluation settings
For materials scientists, this benchmark addresses the gap between computational predictions and experimental reality, but the contribution is primarily evaluative rather than methodological.
The paper introduces RealMat-BaG, a benchmark for bandgap prediction that evaluates models under experimentally relevant conditions, revealing fundamental generalization limitations of current models. No concrete performance numbers are provided.
Accurate bandgap prediction is crucial for semiconductor applications, yet machine learning models trained on computational data often struggle to generalize to experimental bandgap measurements. Challenges related to data fidelity, domain generalization, and model interpretability remain insufficiently addressed in existing evaluation frameworks. To bridge this gap, we introduce RealMat-BaG, a benchmark for assessing model reliability under experimentally relevant conditions. We curate an open-access dataset of experimental bandgaps with aligned crystal structures and compare graph neural networks as well as classical machine learning baselines. Our framework evaluates performance across statistical and domain-based splits, examines transfer from DFT-computed to experimental bandgaps, and analyzes interpretability at both elemental-property and structural levels. Our results reveal the fundamental generalization limitations of current bandgap prediction models and establish a benchmark aligned with experimental measurements for developing more reliable learning strategies for materials discovery.