Facts Do Care About Your Language: Assessing Answer Quality of Multilingual LLMs
This addresses the problem of ensuring correctness in educational tools for non-English speakers, but it is incremental as it focuses on evaluating existing models rather than proposing new solutions.
The study evaluated the factuality of Llama3.1 models in answering factual questions for middle and high school students across multiple languages, finding that they provide extraneous and less truthful information and exacerbate biases against rare languages.
Factuality is a necessary precursor to useful educational tools. As adoption of Large Language Models (LLMs) in education continues of grow, ensuring correctness in all settings is paramount. Despite their strong English capabilities, LLM performance in other languages is largely untested. In this work, we evaluate the correctness of the Llama3.1 family of models in answering factual questions appropriate for middle and high school students. We demonstrate that LLMs not only provide extraneous and less truthful information, but also exacerbate existing biases against rare languages.