CL AIApr 24, 2025

Multilingual Performance Biases of Large Language Models in Education

Vansh Gupta, Sankalan Pal Chowdhury, Vilém Zouhar, Donya Rooein, Mrinmaya Sachan

arXiv:2504.17720v213.912 citationsh-index: 40

Originality Incremental advance

AI Analysis

This highlights performance biases in LLMs for educational applications in non-English languages, which is an incremental but important issue for practitioners in multilingual education.

The study evaluated popular large language models on four educational tasks across eight non-English languages and English, finding that performance correlates with training data representation, with significant drops in lower-resource languages compared to English.

Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in non-English languages is warranted. We evaluated the performance of popular LLMs on four educational tasks: identifying student misconceptions, providing targeted feedback, interactive tutoring, and grading translations in eight languages (Mandarin, Hindi, Arabic, German, Farsi, Telugu, Ukrainian, Czech) in addition to English. We find that the performance on these tasks somewhat corresponds to the amount of language represented in training data, with lower-resource languages having poorer task performance. Although the models perform reasonably well in most languages, the frequent performance drop from English is significant. Thus, we recommend that practitioners first verify that the LLM works well in the target language for their educational task before deployment.

View on arXiv PDF

Similar