CLOct 10, 2025

Enhancing Faithfulness in Abstractive Summarization via Span-Level Fine-Tuning

arXiv:2510.09915v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the issue of unfaithful summaries for users relying on automated text condensation, though it is incremental as it builds on existing fine-tuning methods.

The paper tackles the problem of hallucinations in abstractive summarization by fine-tuning large language models using span-level annotations of unfaithful content, resulting in improved faithfulness with unlikelihood training being the most effective among three tested techniques.

Abstractive summarization using large language models (LLMs) has become an essential tool for condensing information. However, despite their ability to generate fluent summaries, these models sometimes produce unfaithful summaries, introducing hallucinations at the word, phrase, or concept level. Existing mitigation strategies, such as post-processing corrections or contrastive learning with synthetically generated negative samples, fail to fully address the diverse errors that can occur in LLM-generated summaries. In this paper, we investigate fine-tuning strategies to reduce the occurrence of unfaithful spans in generated summaries. First, we automatically generate summaries for the set of source documents in the training set with a variety of LLMs and then use GPT-4o to annotate any hallucinations it detects at the span-level. Leveraging these annotations, we fine-tune LLMs with both hallucination-free summaries and annotated unfaithful spans to enhance model faithfulness. In this paper, we introduce a new dataset that contains both faithful and unfaithful summaries with span-level labels and we evaluate three techniques to fine-tuning a LLM to improve the faithfulness of the resulting summarization: gradient ascent, unlikelihood training, and task vector negation. Experimental results show that all three approaches successfully leverage span-level annotations to improve faithfulness, with unlikelihood training being the most effective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes