CLJan 20, 2025

Multi-round, Chain-of-thought Post-editing for Unfaithful Summaries

arXiv:2501.11273v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the issue of factual inconsistencies in summaries for users relying on accurate information, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the problem of improving faithfulness in news summarization by using LLMs as post-editors with chain-of-thought prompts, achieving a higher editing success rate than prior work and performing comparably to fine-tuned models.

Recent large language models (LLMs) have demonstrated a remarkable ability to perform natural language understanding and generation tasks. In this work, we investigate the use of LLMs for evaluating faithfulness in news summarization, finding that it achieves a strong correlation with human judgments. We further investigate LLMs' capabilities as a faithfulness post-editor, experimenting with different chain-of-thought prompts for locating and correcting factual inconsistencies between a generated summary and the source news document and are able to achieve a higher editing success rate than was reported in prior work. We perform both automated and human evaluations of the post-edited summaries, finding that prompting LLMs using chain-of-thought reasoning about factual error types is an effective faithfulness post-editing strategy, performing comparably to fine-tuned post-editing models. We also demonstrate that multiple rounds of post-editing, which has not previously been explored, can be used to gradually improve the faithfulness of summaries whose errors cannot be fully corrected in a single round.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes