Investigating Hallucination in Conversations for Low Resource Languages
This addresses hallucination issues for low-resource language users, but it is incremental as it extends existing English-focused research to new languages.
The study tackled hallucination in large language models by analyzing conversational data in Hindi, Farsi, and Mandarin, finding that models like GPT-3.5 and Llama-3.1 produced significantly more hallucinations in Hindi and Farsi compared to Mandarin.
Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text that closely resemble human writing. However, they often generate factually incorrect statements, a problem typically referred to as 'hallucination'. Addressing hallucination is crucial for enhancing the reliability and effectiveness of LLMs. While much research has focused on hallucinations in English, our study extends this investigation to conversational data in three languages: Hindi, Farsi, and Mandarin. We offer a comprehensive analysis of a dataset to examine both factual and linguistic errors in these languages for GPT-3.5, GPT-4o, Llama-3.1, Gemma-2.0, DeepSeek-R1 and Qwen-3. We found that LLMs produce very few hallucinated responses in Mandarin but generate a significantly higher number of hallucinations in Hindi and Farsi.