Machine Learning-Driven Crystal System Prediction for Perovskites Using Augmented X-ray Diffraction Data

arXiv:2602.04435v11 citationsh-index: 5Eng appl artif intell
Originality Synthesis-oriented
AI Analysis

This work addresses the need for efficient structural characterization in materials science, particularly for perovskite applications in photovoltaics and optoelectronics, but it is incremental as it applies existing methods to a specific domain.

The study tackled the problem of predicting crystal systems, point groups, and space groups from X-ray diffraction data for perovskite materials using a machine learning framework, achieving high performance with metrics such as a Matthews correlation coefficient of 0.9 and accuracy up to 97.76%.

Prediction of crystal system from X-ray diffraction (XRD) spectra is a critical task in materials science, particularly for perovskite materials which are known for their diverse applications in photovoltaics, optoelectronics, and catalysis. In this study, we present a machine learning (ML)-driven framework that leverages advanced models, including Time Series Forest (TSF), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and a simple feedforward neural network (NN), to classify crystal systems, point groups, and space groups from XRD data of perovskite materials. To address class imbalance and enhance model robustness, we integrated feature augmentation strategies such as Synthetic Minority Over-sampling Technique (SMOTE), class weighting, jittering, and spectrum shifting, along with efficient data preprocessing pipelines. The TSF model with SMOTE augmentation achieved strong performance for crystal system prediction, with a Matthews correlation coefficient (MCC) of 0.9, an F1 score of 0.92, and an accuracy of 97.76%. For point and space group prediction, balanced accuracies above 95% were obtained. The model demonstrated high performance for symmetry-distinct classes, including cubic crystal systems, point groups 3m and m-3m, and space groups Pnma and Pnnn. This work highlights the potential of ML for XRD-based structural characterization and accelerated discovery of perovskite materials

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes