CLJun 8, 2024

Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation

Neeraj Varshney, Satyam Raj, Venkatesh Mishra, Agneet Chatterjee, Ritika Sarkar, Amir Saeidi, Chitta Baral

arXiv:2406.05494v113.523 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a critical shortcoming in LLMs for tasks requiring logical reasoning and nuanced language understanding, but it is incremental as it builds on existing research on hallucinations.

The paper tackled the problem of hallucinations in large language models (LLMs) when handling negation, showing that models like LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on tasks such as false premise completion and fact generation, and it studied strategies to mitigate these issues.

Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has focused on investigating and addressing this problem for a variety of tasks such as biography generation, question answering, abstractive summarization, and dialogue generation. However, the crucial aspect pertaining to 'negation' has remained considerably underexplored. Negation is important because it adds depth and nuance to the understanding of language and is also crucial for logical reasoning and inference. In this work, we address the above limitation and particularly focus on studying the impact of negation in LLM hallucinations. Specifically, we study four tasks with negation: 'false premise completion', 'constrained fact generation', 'multiple choice question answering', and 'fact generation'. We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation which underlines a critical shortcoming of these models. Addressing this problem, we further study numerous strategies to mitigate these hallucinations and demonstrate their impact.

View on arXiv PDF

Similar