MTRL-SCILGCOMP-PHAug 21, 2025

A simulation-based training framework for machine-learning applications in ARPES

arXiv:2508.15983v1h-index: 43Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the data bottleneck for experimentalists in ARPES, enabling more efficient ML applications, though it is incremental as it applies existing simulation methods to a specific domain.

The authors tackled the lack of training data for machine learning in ARPES by developing an open-source synthetic spectra simulator, and demonstrated that a model trained on simulated data can assess spectra quality more accurately than human analysis and identify optimal measurement regions with high precision.

In recent years, angle-resolved photoemission spectroscopy (ARPES) has advanced significantly in its ability to probe more observables and simultaneously generate multi-dimensional datasets. These advances present new challenges in data acquisition, processing, and analysis. Machine learning (ML) models can drastically reduce the workload of experimentalists; however, the lack of training data for ML -- and in particular deep learning -- is a significant obstacle. In this work, we introduce an open-source synthetic ARPES spectra simulator - aurelia - for the purpose of generating the large datasets necessary to train ML models. As a demonstration, we train a convolutional neural network to evaluate ARPES spectra quality -- a critical task performed during the initial sample alignment phase of the experiment. We benchmark the simulation-trained model against actual experimental data and find that it can assess the spectra quality more accurately than human analysis, and swiftly identify the optimal measurement region with high precision. Thus, we establish that simulated ARPES spectra can be an effective proxy for experimental spectra in training ML models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes