Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018
It addresses the problem of understanding factors affecting student math performance for educational policymakers and researchers, though it is incremental as it uses existing XAI methods on new data.
This study applied explainable AI techniques to PISA 2018 data to predict and understand mathematics achievement across ten countries, finding that non-linear models like Random Forest outperformed linear ones and identified key predictors such as socio-economic status and study time, with performance metrics including R^2 and MAE.
Understanding the factors that shape students' mathematics performance is vital for designing effective educational policies. This study applies explainable artificial intelligence (XAI) techniques to PISA 2018 data to predict math achievement and identify key predictors across ten countries (67,329 students). We tested four models: Multiple Linear Regression (MLR), Random Forest (RF), CATBoost, and Artificial Neural Networks (ANN), using student, family, and school variables. Models were trained on 70% of the data (with 5-fold cross-validation) and tested on 30%, stratified by country. Performance was assessed with R^2 and Mean Absolute Error (MAE). To ensure interpretability, we used feature importance, SHAP values, and decision tree visualizations. Non-linear models, especially RF and ANN, outperformed MLR, with RF balancing accuracy and generalizability. Key predictors included socio-economic status, study time, teacher motivation, and students' attitudes toward mathematics, though their impact varied across countries. Visual diagnostics such as scatterplots of predicted vs actual scores showed RF and CATBoost aligned closely with actual performance. Findings highlight the non-linear and context-dependent nature of achievement and the value of XAI in educational research. This study uncovers cross-national patterns, informs equity-focused reforms, and supports the development of personalized learning strategies.