CLOct 23, 2024

Multilingual Hallucination Gaps in Large Language Models

Cléa Chataigner, Afaf Taïk, Golnoosh Farnadi

arXiv:2410.18270v16.16 citationsh-index: 17

Originality Synthesis-oriented

AI Analysis

This addresses the problem of unreliable multilingual text generation for users relying on LLMs as search alternatives, but it is incremental as it extends existing evaluation methods to a multilingual context.

The study investigated how often large language models generate false information (hallucinations) in different languages, finding variations in hallucination rates, particularly between high and low resource languages, using experiments with models like LLaMA and Qwen across 19 languages.

Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.

View on arXiv PDF

Similar