CLAIOct 18, 2024

LoGU: Long-form Generation with Uncertainty Expressions

Cambridge
arXiv:2410.14309v421 citationsh-index: 10ACL
Originality Incremental advance
AI Analysis

This work addresses the issue of factual inaccuracies in long-form AI-generated content, which is crucial for real-world applications, though it is incremental as it builds on existing uncertainty modeling approaches.

The paper tackles the problem of hallucinations in long-form generation by enabling models to express uncertainty, introducing the LoGU task and addressing challenges like uncertainty suppression and misalignment. The proposed method improves accuracy, reduces hallucinations, and maintains response comprehensiveness across three datasets.

While Large Language Models (LLMs) demonstrate impressive capabilities, they still struggle with generating factually incorrect content (i.e., hallucinations). A promising approach to mitigate this issue is enabling models to express uncertainty when unsure. Previous research on uncertainty modeling has primarily focused on short-form QA, but realworld applications often require much longer responses. In this work, we introduce the task of Long-form Generation with Uncertainty(LoGU). We identify two key challenges: Uncertainty Suppression, where models hesitate to express uncertainty, and Uncertainty Misalignment, where models convey uncertainty inaccurately. To tackle these challenges, we propose a refinement-based data collection framework and a two-stage training pipeline. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. The collected data are then used in training through supervised fine-tuning (SFT) and direct preference optimization (DPO) to enhance uncertainty expression. Extensive experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes