CLAIFeb 16, 2024

Strong hallucinations from negation and how to fix them

arXiv:2402.10543v227 citationsh-index: 4ACL
Originality Incremental advance
AI Analysis

This addresses a critical issue in language model reasoning for AI applications, offering a method to reduce hallucinations without extensive data, though it is incremental as it focuses specifically on negation.

The paper tackled the problem of language models producing strong hallucinations due to logical incoherence, particularly with negation, and introduced a novel solution that treats negation as an operation over latent representations, improving performance in cloze prompting and natural language inference tasks without needing sparse negative data.

Despite great performance on many tasks, language models (LMs) still struggle with reasoning, sometimes providing responses that cannot possibly be true because they stem from logical incoherence. We call such responses \textit{strong hallucinations} and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations. Focusing on negation, we provide a novel solution in which negation is treated not as another element of a latent representation, but as \textit{an operation over an LM's latent representations that constrains how they may evolve}. We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes