CLAILGApr 2, 2024

Hallucination Diversity-Aware Active Learning for Text Summarization

arXiv:2404.01588v131 citationsh-index: 18NAACL
Originality Incremental advance
AI Analysis

This addresses the issue of diverse hallucinations in text summarization for LLM users, offering a novel approach but with incremental improvements in efficiency.

The paper tackles the problem of hallucinations in LLM-generated text summaries by proposing an active learning framework that reduces the need for costly human annotations, achieving effective and efficient mitigation across three datasets and different backbone models.

Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which limits their effectiveness in addressing various types of hallucinations exhibited in LLM outputs. To our best knowledge, in this paper we propose the first active learning framework to alleviate LLM hallucinations, reducing costly human annotations of hallucination needed. By measuring fine-grained hallucinations from errors in semantic frame, discourse and content verifiability in text summarization, we propose HAllucination Diversity-Aware Sampling (HADAS) to select diverse hallucinations for annotations in active learning for LLM finetuning. Extensive experiments on three datasets and different backbone models demonstrate advantages of our method in effectively and efficiently mitigating LLM hallucinations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes